Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add profile for multiple SPDX files with short licensing/copyright info #502

Open
mxmehl opened this issue Mar 23, 2021 · 20 comments
Open
Labels
profile: licensing Licensing Profile and related matters
Milestone

Comments

@mxmehl
Copy link

mxmehl commented Mar 23, 2021

As suggested by @goneall, I would like to propose a new SPDX profile for the 3.0 spec. At REUSE we're looking for a more flexible and human-editable solution to deprecate our DEP5 spec to bulk-license files, currently under the working title "REUSE.yaml". However, we would love to be compatible with SPDX in this matter.

Goals

  • Ability to mark multiple files, including wildcards.
  • Mark files relative to the location of this SPDX file, and thereby allow multiple of these files.
  • Minimum required information: one or multiple SPDX-FileCopyrightText, and SPDX-License-Identifier.
  • Use same keys as the known SPDX tags (see above)

Rationale

Obviously, a full SPDX file is not readable and maintainable for average developers. There should be a way to specifiy only copyright and licensing of one or multiple files in a very concise manner. To not have to learn a differing syntax, the tags names SPDX-FileCopyrightText and SPDX-License-Identifier should stay the same. Ideally, all information would be applied relative to the SPDX file path (but not being able to define files in paths above its own location).

Scenario: a maintainer marked all source codes files according to the REUSE best practices with in-file comment headers. For a directory with 500 icon files however, they would prefer to bulk-declare these. To do this, it would be the easiest to create a YAML/JSON file inside of this repo, use * as target, and add copyright and licensing information.

Ideas for implementation

We've collected some syntax proposals for REUSE.yaml in this thread. For example:

- src/*:
    SPDX-FileCopyrightText:
      - 2020 Me
      - © 2017 You
    SPDX-License-Identifier: MIT

or

- src/*:
    license:
      SPDX-License-Identifier: MIT
    copyright: |
      SPDX-FileCopyrightText: 2020 Me
      SPDX-FileCopyrightText: © 2017 You

We are open to the exact syntax, but it would be wise to not make it much more complex.

However, I am aware that some of these proposals stretch the general idea of SPDX files and perhaps also the new profiles. I am excited to learn what you folks are thinking about this very simple approach.

@swinslow
Copy link
Member

Hi @mxmehl, thanks for this! Wanted to add my quick initial thoughts:

I really like the idea of something like this, as a lightweight format for developers to use to express this sort of data.

I'm hesitant to call this sort of file, itself, an "SPDX document". I think that whether we're talking the current SPDX 2.2 spec, or the current thinking for 3.0, there are aspects of an SPDX document that are important to retain for its intended use cases which are absent here. I think the intended concept for "profiles" for 3.0 is that they would be the "base" set of fields for SPDX core elements, plus other fields. I don't think minus planned mandatory fields would align with the approaches that have been discussed.

What I'd propose would be, instead, to treat something like this as -- I don't really have a name for it, but something like a "pre-SPDX manifest". In other words, here is a manifest file format that can easily be consumed (by a script, CI/CD system, a GitHub action, etc.) to output an actual, full-fledged SPDX document.

That script / action can do all the hard work of collecting hashes, outputting the data in true SPDX format, etc., and can use this manifest file as an input to fill in the appropriate corresponding fields. The SPDX document that it generates can then be incorporated into the build artifacts that are published every time a version is released.

That way the project developers don't themselves need to manage the SPDX document details, but there is a standardized way to consume this "pre-SPDX" data to generate a true SPDX document.

Does that make sense? Not sure if I'm explaining well but that's my off-the-cuff reaction here.

@goneall
Copy link
Member

goneall commented Mar 25, 2021

@swinslow Since I suggested to @mxmehl a profile, I thought I would add an opinion. We have done a "minus" type profile with the SPDX-Lite proposal from the Asia group, so we do have some precedence. That being said, you suggestion of using a different name to describe the subset makes sense to me.

The only thing I have a strong opinion on is having the light weight manifest information be part of the SPDX specification and consistent with the terms. This will make it easy for the consumers to easily integrate this into their tooling systems (as you stated above) and will also give confidence to the producers that it will fit into the larger ecosystem.

@swinslow
Copy link
Member

Thanks @goneall! Sounds great. Wholeheartedly agree with having the lightweight manifest defined as part of the SPDX spec and consistent with its terms.

@mxmehl
Copy link
Author

mxmehl commented Mar 26, 2021

Thank you for sharing your thoughts! I cannot comment so much on how to call this, and how this integrates in your current schemes.

What I like with the "pre-SPDX" data is that it may allow more freedom regarding location (flexible/relative) and field names (SPDX-License-Identifier/SPDX-FileCopyrightText) than a boiled-down SPDX document that may have more formal requirements. On the other side, if the latter's formal requirements are more flexible, I'd be also fine with that.

In general, I wholeheartly agree that having this as part of the specification makes totally sense for both SPDX and REUSE, and I hope we'll find a good solution.

@zvr
Copy link
Member

zvr commented Mar 26, 2021

An enthusiastic +1 on this proposal.

A few thoughts:

  1. I would strongly suggest to only have existent valid SPDX tags and not introduce any other information. Therefore, in the example presented above I'd object to introducting groupings like license and copyright.
  2. I would also assume that, although @mxmehl's proposal only mentions Copyright and License info, we would be fine to have any File-level attriute represented in such a form, like Contributor or potentially security information.
  3. Finally, I think the right way to integrate this to the spec would be in a manner similar to the section "How to use SPDX inside Files" (and not a profile). Something like "How to provide bulk information on Files". This info could then be processed (by reading the actual filenames and expand the *, for example), to produce an SPDX document.

@mxmehl, any interest in trying to integrate even more REUSE stuff in SPDX spec?
I can totally see a simliar section about how the ExtractedText of licenses can be stored in a directory LICENSES with filenames as the license identifiers, for example.

Thinking more about it, all this falls under the umbrella "add SPDX information" (to a repo/directory/package) instead of "produce an SPDX document". This is an additional direction I'm happy to have in the spec.

@mxmehl
Copy link
Author

mxmehl commented Apr 1, 2021

@zvr Thanks for your feedback. I cannot say much more about how to integrate it into SPDX but I am happy to give feedback to concrete proposals from a (REUSE) user perspective. Again, see me as the advocate of average Jane/Joe developer who is confused by all this legal stuff ;)

@mxmehl, any interest in trying to integrate even more REUSE stuff in SPDX spec?

Sure, why not? The LICENSES directory is an invention of REUSE and is slowly being picked up by other initiatives, e.g. in coreinfrastructure/best-practices-badge#1547

@goneall
Copy link
Member

goneall commented Apr 28, 2021

From the SPDX tech call on 27 April 2021:

  • In general good support for this proposal
  • General consensus that we would prefer license information to be as close to the file as possible (e.g. SPDX tags within source files), but for communities that have metadata in the root directory (e.g. Debian), this proposal would provide better information
  • This proposal would cover non-source files (e.g. binary images)
  • REUSE would provide a defined precedence preferring file, followed by a license file, followed by this proposed file format
  • Tools could be written to use this format to update the specific source files
  • See https://spdx.github.io/spdx-spec/appendix-IX-file-tags/ for where these "File Tags" were defined in SPDX 2.2 and https://reuse.software/spec/ for the broader REUSE spec.
  • Issue raised about adding a file after the metadata is created - should we have some type of Package Verification Code?
  • The SPDX legal team will review the proposal and may provide additional feedback
  • General consensus that this would not constitute an SPDX document, but rather a standard approach to documenting metadata that could be used to create an SPDX document
  • This could be added to the existing appendix IX or added as a new chapter (preference to amending)
  • Discussed if we should use the SPDX properties/terms or have a set of rules on how to translate SPDX terms to terms used in this proposal
    • Consensus on having rules - compatible with other REUSE documents and already defined in appendix IX
    • For the license, keep it as "SPDX-License-Identifier:" as a recognized exception to the rules
  • This file could be anywhere in the directory
  • Interest in adding package information in this metadata file as well
  • Adding any file or package metadata may support use cases beyond the legal profile use cases
    • Need to be careful this doesn't overlap too much with the SPDX-Lite profile
  • For the proposals in this link consensus to avoid option number 4
    • If no preference in 1 through 3, choose one that is most compatible with the SPDX YAML format

@mxmehl
Copy link
Author

mxmehl commented Apr 28, 2021

Excellent summary, thank you! I just would like to emphasise that this "metadata, pre-document file" is limited to its own directory and its subdirectories. So this file cannot bulk-define attributes of parent directories.

Regarding the precedence discussion, please see issue fsfe/reuse-docs#70 in which option 3 is currently the favourite. It should not concern SPDX directly but may give some context and assurance that REUSE will take care of resolving conflicts.

@zvr
Copy link
Member

zvr commented May 21, 2021

So, where shall the discussion on the actual specification of this file take place?
There is need to:

  • define exact syntax (including clear pattern matching rules); and
  • specify conflict resolution rules

@mxmehl
Copy link
Author

mxmehl commented Jun 22, 2021

There are two issues where we can discuss the points raised by @zvr:

I'd love to get feedback on both points so we can prepare a well-founded suggestion.

@kestewart
Copy link
Contributor

For consideration earlier in 2.3 vs. 3.0 - to be discussed.

@swinslow swinslow added the profile: licensing Licensing Profile and related matters label Mar 9, 2022
@kestewart kestewart modified the milestones: 2.3, 3.0 May 10, 2022
@kestewart
Copy link
Contributor

Per discussion in the call, leaning towards leaving it in 3.0 as it's a profile, but need to sync up with @mxmehl

@mxmehl
Copy link
Author

mxmehl commented May 16, 2022

Per discussion in the call, leaning towards leaving it in 3.0 as it's a profile, but need to sync up with @mxmehl

Obviously we'd be happy to have it in 2.3 as it's a feature requested a lot but happy to discuss that with you.

@goneall
Copy link
Member

goneall commented May 18, 2022

@zvr @mxmehl @kestewart I'm thinking if we get a PR within the next week, we can review and potentially include it in the 2.3 release. Let me know if you agree.

@mxmehl
Copy link
Author

mxmehl commented May 23, 2022

Great! How shall we proceed? I am afraid I lack the required detailed inside knowledge of the SPDX spec to draft a pull request that won't raise too many side problems with exact wording and placement. However, I'd be very happy to provide input and feedback early on. One could take fsfe/reuse-docs#81 as inspiration.

One thought that crossed my mind is whether we want to have SPDX-License-Identifier and SPDX-FileCopyrightText in such a file as tools might interpret these strings as license/copyright of this file. On the other hand we don't want to confuse people by a different syntax.

@goneall
Copy link
Member

goneall commented May 23, 2022

@mxmehl Looking back through the thread, it looks like adding a separate "Annex" (previously called Appendix) would be the approach for adding to the spec. Format could be similar to the SPDX Lite Annex

I won't have much bandwidth to help drafting since I'm pretty booked with other SPDX 2.3 activities, but I can help review.

Adding @swinslow @jlovejoy to the thread since the General Meeting notes above indicated legal team review was of interest.

The timeframe may be a bit tight to get this into 2.3 - @zvr @kestewart any thoughts?

@mxmehl
Copy link
Author

mxmehl commented May 24, 2022

OK, no promise but I can try to kickstart a pull request creating an annex to get the ball rolling.

@goneall
Copy link
Member

goneall commented Apr 4, 2024

Since we're a couple weeks away from 3.0, I'm moving this to a 3.1 milestone.

@goneall goneall modified the milestones: 3.0, 3.1 Apr 4, 2024
@silverhook
Copy link
Contributor

@mxmehl , is this now made irrelevant with REUSE 3.2’s reuse.yaml?

@mxmehl
Copy link
Author

mxmehl commented Aug 5, 2024

@mxmehl , is this now made irrelevant with REUSE 3.2’s reuse.yaml?

Indeed. REUSE progressed on its own since the demand was so high.

It would, however, make sense if SPDX acknowledged this procedure in an Annex or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
profile: licensing Licensing Profile and related matters
Projects
None yet
Development

No branches or pull requests

6 participants