Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround for issue in spdx tools related #549

Merged
merged 1 commit into from
Dec 22, 2017
Merged

Workaround for issue in spdx tools related #549

merged 1 commit into from
Dec 22, 2017

Conversation

goneall
Copy link
Member

@goneall goneall commented Dec 22, 2017

@goneall goneall merged commit ea988c4 into master Dec 22, 2017
@@ -35,7 +35,7 @@
the licensed material may still be restricted for other reasons, including because others have
copyright or other rights in the material. A licensor may make special requests, such as asking that
all changes be marked or described. Although not required by our licenses, you are encouraged to
respect those requests where reasonable. More<alt match=" |_" name="spaceUnderscore"> </alt>considerations for the public<alt match=": wiki.creativecommons.org/Considerations_for_licensees|\." name="considerationsLicensees">: wiki.creativecommons.org/Considerations_for_licensees</alt></p>
respect those requests where reasonable. More<alt match=" ?|_" name="spaceUnderscore"> </alt>considerations for the public<alt match=": wiki.creativecommons.org/Considerations_for_licensees|\." name="considerationsLicensees">: wiki.creativecommons.org/Considerations_for_licensees</alt></p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want this, because we don't want to match Moreconsiderations. We only want to match More considerations or the upstream-typo More_considerations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing ? with {1} would match the right things and maybe still work as a workaround?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried the {1} and it does not work with the current tools.

One alternative that would work is:

<alt match="More considerations|More_considerations" name="spaceUnderscore">More Considerations</alt>

Let me know if that is preferred and I'll test and commit the changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW - I'm not personally too concerned about matching Moreconsiderations. Although technically not correct, I don't see any harm in matching this pattern as the author very likely intended More considerations. Perhaps something we can fix in a future revision of the license list?

Copy link
Contributor

@wking wking Dec 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried the {1} and it does not work with the current tools.

Is that a Java limitation? It works in Python:

$ python3.4 -c 'import re; print(re.compile(" {1}|a").search("b c")).groups()'
<_sre.SRE_Match object; span=(1, 2), match=' '>

(both with and without the {1}). And in JavaScript:

` {1}|a`.match(' ')  // Array [ " " ]

(at least as implemented by Firefox 52.4).

<alt match="More considerations|More_considerations" name…

I prefer shorter <alt> sections, but this would work if we cannot agree on a workaround for shorter <alt>s.

I'm not personally too concerned about matching Moreconsiderations.

Maybe. I don't like no longer matching (in a future release) something we explicitly claimed to match (in the next release). I'd rather have the long-<alt> approach than allow Moreconsiderations, because that way we know we don't have to support Moreconsiderations in the future (unless upstream or others start using it in the wild without our prompting). But that said, the chances of someone else looking at our regexp that closely seem low. So I wouldn't feel too bad about breaking backwards compat if that's the only approach we can agree on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a Java limitation?

No.

The issue has to do with the design of the license matching algorithm in the tools.

In matching, white space must be ignored (per the license matching guidelines), so the string is tokenized and the white space is removed when comparing text.

Sometimes the trailing blanks are consumed in this process and is not presented to the regular expression processor when it finds an alt tag.

Basically, if the pattern does not end on token boundaries, it has issues. I've worked around several instances with some rather ugly code, but this case is more challenging.

Since this is a design issue, it is difficult to fix in the code.

I've thought about a complete redesign going to a character by character parsing. This would take significant effort and I am concerned about the resultant performance impact.

If someone has an alternative implementation of the matching algorithm fully implementing the legal team matching guidelines, we could use that. However, as far as I can tell this is the only implementation.

@goneall
Copy link
Member Author

goneall commented Dec 27, 2017

I'll submit a separate PR with the longer alt tags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants