Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qt-LGPL-exception-1.1: Add new ID for Nokia-Qt-exception-1.1 #627

Merged
merged 4 commits into from
Jun 14, 2018

Conversation

wking
Copy link
Contributor

@wking wking commented Mar 23, 2018

And deprecate the old ID, which had two problems:

  • Qt no longer belongs to Nokia, and they've changed the title upstream to reflect that.
  • Qt has another exception for GPL3, and the old ID didn't mention the target license.

This commit adds the new identifier and deprecates the old one in its favor.

I've used Qt-LGPL-exception-1.1 instead of Kai's suggested Qt-exception-LGPL-1.1 to match the {name}-{target}-exception-{version} pattern suggested by the existing i2p-gpl-java-exception.

I've set an <alt> in the exception title so it will match the old “Nokia ” prefix, the current “The Qt Company ” prefix, or no prefix at all. I've used “The Qt Company ” prefix in the test file to match the upstream text (the test is a copy/paste of the upstream text with trailing whitespace removed).

Builds on #626; review that first. Someone with write access should label this “new license/exception request”.

CC @kkoehne.

@wking
Copy link
Contributor Author

wking commented Mar 23, 2018

@goneall, is the test failure because of a space/<alt> matching issue in the tools? If so, do you have suggestions for working around the issue?

@silverhook
Copy link
Collaborator

I agree.

And if memory serves me right, (a form of) the exception predates Nokia’s acquisition of Qt as well.

Qt got passed on as such: Trolltech ↦ Nokia ↦ Digia↦ Qt Company – which further strengthens the suggestion that a more generic name should be used.

There is also the Free Qt Foundation, but that’s not the primary concern here :)

@goneall
Copy link
Member

goneall commented Mar 23, 2018

It turns out there are 2 problems causing the tests to fail.

Having a null match in the alt match match="|Nokia |The Qt Company " is matching the empty string first and ignoring the matches for The Qt Company , so it doesn't skip those tokens. To solve this, optional can be placed around the alt and the match and not including the empty string in the match choices.

The second problem is including the trailing space in the matching text. Since the matching software tokenizes and removes white space, it doesn't include the space in the match. To work around this, just don't include any preceding or trailing white space in the match.

Replacing <alt name="nokia" match="|Nokia |The Qt Company ">The Qt Company </alt> with <optional><alt name="nokia" match="Nokia|The Qt Company">The Qt Company </alt></optional> should work.

@wking
Copy link
Contributor Author

wking commented Mar 24, 2018

Having a null match in the alt match...

Is this something we can fix in the parser? We already have some of these in master (e.g. here).

<alt name="nokia" match="Nokia|The Qt Company">The Qt Company </alt>

Including a space in the inner text but not in the match might break a canonical parser. I think I'll need to accept an unnecessary leading space and use:

<alt name="nokia" match="|Nokia|The Qt Company">The Qt Company</alt>

with the trailing space outside the <alt>.

@goneall
Copy link
Member

goneall commented Mar 24, 2018

Is this something we can fix in the parser?

I could not come up with a good general fix (I'll describe the problem in more detail below).

I did find another possible work-around. If you order the match so that the empty match is at the end (e.g. match="Nokia|The Qt Company|"), it will work. I'm not sure if this work-around is implementation specific or something that is in the Regular Expression specification. The pattern matcher seems to try to match the options in order.

The specific issue is the algorithm used to the the alt matching:

  1. Identify the text that may apply to the alt pattern by looking ahead to the next non-alt and non-optional text areas
  2. Create a regular expression matcher based on the match=... pattern provided
  3. Attempt to match using the regular expression
  4. If the pattern does not match - just continue matching at the current text position
  5. If the pattern does match, move the cursor forward to after the match, then continue matching

The problem is number 5 above - if an empty choice is matched (and it will always match), the cursor is moved forward by zero tokens (e.g. it doesn't move), this leaves the cursor at its current position and doesn't match the variable text we want it to. The match then fails when it applies the next matching rule.

I can not think of a general design that would avoid this situation.

We've had these in this repository since the exception landed in
e355521 (Adding exception: Nokia-Qt-exception-1.1, 2017-02-05, spdx#354),
and they've been in license-list-data since the exception landed there
in spdx/license-list-data@cec28dc7 (Updated JSON exception files,
2016-04-16).  I haven't bothered with an <alt> tag because the
matching guidelines have [1]:

  5.1.3 Guideline: Quotes Any variation of quotations (single, double,
  curly, etc.) should be considered equivalent.

That doesn't explicitly say that you can add/remove quotes in the
general case.  And maybe that's intentional.  But if we treat the
whole unit as a quote (going from quad-quotes to double-quotes) then
this change should have no impact on matching.

In practice, I think it's unlikely that folks copied the exception
with the quad-quotes.  And the upstream linked from our XML is pinned
to a specific revision and only uses double-quotes.

[1]: https://spdx.org/spdx-license-list/matching-guidelines
And deprecate the old ID, which had two problems [1]:

* Qt no longer belongs to Nokia, and they've changed the title
  upstream to reflect that.
* Qt has another exception for GPL3, and the old ID didn't mention the
  target license.

This commit adds the new identifier and deprecates the old one in its
favor.

I've used Qt-LGPL-exception-1.1 instead of Kai's suggested
Qt-exception-LGPL-1.1 to match the {name}-{target}-exception-{version}
pattern suggested by the existing i2p-gpl-java-exception.

I've set an <alt> in the exception title so it will match the old
"Nokia " prefix, the current "The Qt Company " prefix, or no prefix at
all.  I've used "The Qt Company " prefix in the test file to match the
upstream text (the test is a copy/paste of the upstream text with
trailing whitespace removed).

[1]: https://lists.spdx.org/pipermail/spdx-legal/2018-March/002511.html
     Subject: New License/Exception Request: Qt-LGPL-exception-1.1
     Date: Fri, 23 Mar 2018 15:39:54 +0000
     Message-ID: <HE1PR0202MB2571F0B9F0305FA2150A428F98A80@HE1PR0202MB2571.eurprd02.prod.outlook.com>
@wking
Copy link
Contributor Author

wking commented Mar 24, 2018

Rebased onto master to pick up #392 so we can add an <obsoletedBy> entry.

spdx-tools can't handle this at the moment:

On Fri, Mar 23, 2018 at 11:36:30PM +0000, Gary O'Neall wrote [1]:
> The second problem is including the trailing space in the matching
> text.  Since the matching software tokenizes and removes white
> space, it doesn't include the space in the match.  To work around
> this, just don't include any preceding or trailing white space in
> the match.

And while the extra leading whitespace in the no-prefix case isn't
very elegant, it's not a problem either.

[1]: spdx#627 (comment)
@goneall
Copy link
Member

goneall commented Mar 25, 2018

I did some more research on the regex matching and confirmed that the order of Regex alternation does matter and is consistent between Regex engine implementations.

Based on this information, the pattern should be changed from match="|Nokia|The Qt Company" to match="Nokia|The Qt Company|" This is a solution and not a work-around as the matches are working properly.

The problem with the current match is that the empty alternation matches first (and always matches) and the rest of the regex pattern isn't even examined. Since the company name (in this case) is not consumed by the regex matching, the rest of the pattern match will fail.

There is no need to surround the <alt> ... </alt> by <optional>...</optional>

We should check the order of the other alts where empty alternations are in the match string - the empty alternation should always be last.

The alternation operator (|) is commutative in the POSIX spec, which
has [1]:

> If the pattern permits a variable number of matching characters and
> thus there is more than one such sequence starting at that point,
> the longest such sequence is matched.

Testing with a few engines, here are some that match POSIX:

  $ echo abc | grep -Eo '|a|ab'
  ab
  $ python -c 'import regex; print(regex.match("|a|ab", "abc", regex.POSIX).group(0))'
  ab  # Background in https://bitbucket.org/mrabarnett/mrab-regex/issues/150
  $ psql -Atc "select substring('abc' from '|a|ab')"
  ab

The docs for PostgreSQL are somewhat complicated, but for two or more
branches connected by the | operator they are always greedy [2].

Here are some engines that prefer the left-most branch:

  $ python -c 'import re; print(re.match("|a|ab", "abc").group(0))'
  $ python -c 'import regex; print(regex.match("|a|ab", "abc").group(0))'
  $ node -e 'console.log("abc".match(/|a|ab/)[0])'  # [3]
  $ ruby -e 'print "abc".match(/|a|ab/); print "\n"'  # [4]

Go's stdlib provides both left-most-branch [5] and longest-match [6]
implementations.

So the old order was compliant with POSIX EREs (as referenced in
schema/ListedLicense.xsd), but this commit sorts the branches for
longest-match-first for compatibility with engines that break POSIX
and prefer the left-most branch.

[1]: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_01_02
[2]: https://www.postgresql.org/docs/10/static/functions-matching.html#POSIX-MATCHING-RULES
[3]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#special-or
[4]: https://ruby-doc.org/core-2.5.0/Regexp.html#class-Regexp-label-Alternation
     Although these docs don't have much to say on longest-match
     vs. left-most branch.
[5]: https://golang.org/pkg/regexp/#Compile
[6]: https://golang.org/pkg/regexp/#CompilePOSIX
@wking
Copy link
Contributor Author

wking commented Mar 25, 2018

I did some more research on the regex matching and confirmed that the order of Regex alternation does matter and is consistent between Regex engine implementations.

Interesting. I did some digging too, and it looks like the POSIX | is commutative, preferring the longest overall match. But it also looks like many (but not all) regexp engines prefer the left-most-branch match instead for performance reasons. I've pushed b6b27fd here to reorder the matches so that longest-match and left-most-branch coincide, which is a workaround that allows us to support those left-most-branch implementations. More details in the b6b27fd commit message.

@wking
Copy link
Contributor Author

wking commented Mar 25, 2018

We should check the order of the other alts where empty alternations are in the match string - the empty alternation should always be last.

I've filed #629 adding left-most-branch compatibility to the regexps which are already in master.

@jlovejoy jlovejoy added this to the 3.2 release milestone Apr 5, 2018
@bradleeedmondson
Copy link
Contributor

Discussed on legal call today; agree to adding new ID and deprecating old ID that references Nokia.

@jlovejoy jlovejoy merged commit 163db0a into spdx:master Jun 14, 2018
bradleeedmondson added a commit that referenced this pull request Mar 5, 2019
Deprecate libpng in favor of LibPNG-1.0
* add LibPNG-1.0.xml, with notes about deprecation, and line-wrap at about 80 chars
* update libpng.xml to deprecated status, with deprecation License List version, and deprecation notes
* add test file LibPNG-1.0.txt, leaving old test file Libpng.txt in place per other deprecations (e.g., #627)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants