-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workaround for issue in spdx tools related #549
Conversation
Signed-off-by: Gary O'Neall <[email protected]>
@@ -35,7 +35,7 @@ | |||
the licensed material may still be restricted for other reasons, including because others have | |||
copyright or other rights in the material. A licensor may make special requests, such as asking that | |||
all changes be marked or described. Although not required by our licenses, you are encouraged to | |||
respect those requests where reasonable. More<alt match=" |_" name="spaceUnderscore"> </alt>considerations for the public<alt match=": wiki.creativecommons.org/Considerations_for_licensees|\." name="considerationsLicensees">: wiki.creativecommons.org/Considerations_for_licensees</alt></p> | |||
respect those requests where reasonable. More<alt match=" ?|_" name="spaceUnderscore"> </alt>considerations for the public<alt match=": wiki.creativecommons.org/Considerations_for_licensees|\." name="considerationsLicensees">: wiki.creativecommons.org/Considerations_for_licensees</alt></p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want this, because we don't want to match Moreconsiderations
. We only want to match More considerations
or the upstream-typo More_considerations
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing ?
with {1}
would match the right things and maybe still work as a workaround?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried the {1} and it does not work with the current tools.
One alternative that would work is:
<alt match="More considerations|More_considerations" name="spaceUnderscore">More Considerations</alt>
Let me know if that is preferred and I'll test and commit the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW - I'm not personally too concerned about matching Moreconsiderations. Although technically not correct, I don't see any harm in matching this pattern as the author very likely intended More considerations. Perhaps something we can fix in a future revision of the license list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried the
{1}
and it does not work with the current tools.
Is that a Java limitation? It works in Python:
$ python3.4 -c 'import re; print(re.compile(" {1}|a").search("b c")).groups()'
<_sre.SRE_Match object; span=(1, 2), match=' '>
(both with and without the {1}
). And in JavaScript:
` {1}|a`.match(' ') // Array [ " " ]
(at least as implemented by Firefox 52.4).
<alt match="More considerations|More_considerations" name…
I prefer shorter <alt>
sections, but this would work if we cannot agree on a workaround for shorter <alt>
s.
I'm not personally too concerned about matching Moreconsiderations.
Maybe. I don't like no longer matching (in a future release) something we explicitly claimed to match (in the next release). I'd rather have the long-<alt>
approach than allow Moreconsiderations
, because that way we know we don't have to support Moreconsiderations
in the future (unless upstream or others start using it in the wild without our prompting). But that said, the chances of someone else looking at our regexp that closely seem low. So I wouldn't feel too bad about breaking backwards compat if that's the only approach we can agree on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that a Java limitation?
No.
The issue has to do with the design of the license matching algorithm in the tools.
In matching, white space must be ignored (per the license matching guidelines), so the string is tokenized and the white space is removed when comparing text.
Sometimes the trailing blanks are consumed in this process and is not presented to the regular expression processor when it finds an alt tag.
Basically, if the pattern does not end on token boundaries, it has issues. I've worked around several instances with some rather ugly code, but this case is more challenging.
Since this is a design issue, it is difficult to fix in the code.
I've thought about a complete redesign going to a character by character parsing. This would take significant effort and I am concerned about the resultant performance impact.
If someone has an alternative implementation of the matching algorithm fully implementing the legal team matching guidelines, we could use that. However, as far as I can tell this is the only implementation.
I'll submit a separate PR with the longer alt tags. |
Work around for issue Match expressions for license templates fail when using white space
Signed-off-by: Gary O'Neall [email protected]