Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support a similarity search by full license text #19

Open
sschuberth opened this issue Apr 13, 2022 · 1 comment
Open

Support a similarity search by full license text #19

sschuberth opened this issue Apr 13, 2022 · 1 comment

Comments

@sschuberth
Copy link

Given an arbitrary full license (plain) text, it would be cool if we could do a similarity search and get those entries listed whose texts are similar. The metric of "similar" would need to be defined, plus maybe some sort of threshold would need to be configurable; but maybe that's already going too far, and just some simple similarity search with sane defaults would suffice.

@pombredanne
Copy link
Contributor

@sschuberth That's an excellent idea! we may have an aspiring contributor with @adii21-Ux for a closely related GSoC project idea: https://github.com/nexB/aboutcode/wiki/GSOC-2022#scancodeio--scancode-toolkit-create-web-application-to-scan-and-review-a-single-license-text

The idea is to run ScanCode license detection on some input text which is the essential similarity search for this!
I think we could extend this to add a threshold and present not one single match but all the matches found with scores... and show a visual diff of sorts... this could be extended to other text similarity metrics too.

@sschuberth sschuberth changed the title Support a similarity search by full license tests Support a similarity search by full license text Apr 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants