Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use BCP 14 or inspiration of that in Conventions document #546

Open
5 tasks
erget opened this issue Sep 20, 2024 · 4 comments
Open
5 tasks

Use BCP 14 or inspiration of that in Conventions document #546

erget opened this issue Sep 20, 2024 · 4 comments
Assignees
Labels
enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format

Comments

@erget
Copy link
Member

erget commented Sep 20, 2024

The Conventions document could be made clearer by removing ambiguities around certain words. BCP 14 handles this in a way that is simple and clear. It's straightforward to adopt BCP 14 or to be inspired by it in such a way that users profit, similarly to how we've been inspired by Semantic Versioning without adopting it wholesale.

We believe that this can be implemented by mid-2025. As soon as we've implemented, all future pull requests will profit. We expect it will first be merged in CF-1.13.

If you want to work on this, please self-assign or ping me - that'll help us keep track. These people have participated in the discussions up till now (I may be forgetting someone, sorry!):
@mraspaud @davidhassell @JonathanGregory @larsbarring @cofinoa @feggleton @DocOtak

I will keep this issue up to date as multiple PRs will likely be required in order to implement this.

Steps to complete

  • @larsbarring will post a version of the Conventions with annotations on the BCP 14 controlled vocab as well as "extended vocab" that we should consider rewording to match BCP 14. In the hackathon we also discussed augmenting the extended vocab with "Suggest, allow, permit, forbid, prohibit".
  • We decide whether we want to adopt BCP 14 or simply get inspired by it. The main question at the moment is whether we want to use all caps on controlled vocab, as is REQUIRED by BCP 14. Some people like that, others aren't so sure, we should look and see how we like it.
  • We pen a text stating how we are using BCP 14. Are we using it wholesale? Do we extend it to additional words? Do we use it without uppercasing? @feggleton has expressed interest in contributing to this. Then potentially in parallel:
  • We divide up the Conventions document and check the occurrences of the controlled vocab, rewording if necessary. Probably it makes sense to gather a coalition of the willing and work in parallel, merging into a single branch. Currently there are ~1k occurrences so this is a tractable problem as long as we don't allow CF to be rewritten several times by AIs.
  • We develop a pre-merge action to check for use of controlled vocab and highlight that, asking the user to confirm that we're using any introduced controlled terms consistently.

The pre-merge action could be something like (very draft):

#!/bin/bash

# Are we on a pull request?
if [ -z "$GITHUB_HEAD_REF" ]; then
  echo "This script is meant to run on a pull request."
  exit 1
fi

TARGET_BRANCH=${GITHUB_BASE_REF:-main}
diff_output=$(git diff origin/"$TARGET_BRANCH"... --unified=0 --name-only)
for file in $diff_output; do
  # Get added lines in each file
  added_lines=$(git diff origin/"$TARGET_BRANCH"... --unified=0 "$file" | grep -E '^\+' | grep -vE '^\+\+\+')

  # Search for controlled vocab
  if echo "$added_lines" | grep -iE "$vocab"; then
    vocab_found=1
    echo "Controlled vocabulary found in $file:"
    echo "$added_lines" | grep -iE "$vocab"
    echo
  fi

if [ -n "$vocab_found" ]; then
  echo "Controlled vocabulary was found in your changes."
  echo "Please verify that these words are used in line with the guidelines set forth in:"
  # Would need to make this link point to the right section which doesn't exist yet!
  echo "https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#_overview"
  exit 1
else
  echo "No controlled vocabulary found in added lines."
fi
@erget erget added the enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format label Sep 20, 2024
@erget erget self-assigned this Sep 20, 2024
@erget
Copy link
Member Author

erget commented Sep 20, 2024

@sadielbartholomew welcome aboard 😊

@DocOtak
Copy link
Member

DocOtak commented Sep 23, 2024

I am of the somewhat strong opinion that if we want to adopt BCP 14, it SHOULD be done without extensions or as something inspired by it. I feel the point is to add the same rigor (or feeling of) that I've come to view IETF's RFCs with, and some rare OGC standards e.g. Coverage JSON. When viewing a document and I'm seeing those keywords in all caps, I know I'm dealing with BCP14 and don't need to go look at their own custom definitions. What would I say to someone who made their own data standard that was simply "inspired by" CF? Using the extended keyword list should be used only in identifying phrasing to consider modifying to use the BCP14 keywords.

Perhaps we could consider RFC 6919 next April 1.

@larsbarring
Copy link
Contributor

@DocOtak, you do have good point regarding "cherry-picking", and I agree. At the same time I think that the whole endeavour is somewhat broader in a first phase:

Clarify [for ourselves] what we actually mean when using the words from the BCP14 and "extended BCP14" lists. Is a "should" really a should or do we in fact mean shall or must (etc.)? And then update the text so that is is clear and consistent. At the same time I agree that we should (SHOULD or MUST?) try to replace "extended BCP14" words with those from BCP14, but that might not always be possible.

The question is then what to do with any remaining "extended BCP14" words, e.g. deprecated comes to mind. Do we want to leave them as is (without a specified meaning), or do we want to somehow clarify what we mean? I think that is a next step.

And, then, a last step would be to explore ways to render the relevant words in the documents.

@larsbarring
Copy link
Contributor

larsbarring commented Oct 2, 2024

Regarding the first step ("@larsbarring will post a version of the Conventions with annotations....", there is now a html version available here:

https://github.com/larsbarring/cf-conventions/tree/my_bcp-14/BCP-14/conventions_build

Download the cf-conventions.html and open it in your browser. To get a quick first impression you can go to Chapter 4. Coordinate types. Clearly, the rendering is not indended to be used in the final document, only to highlight the words (hopefully in a color neutral way) without changing typeface or capitalization.

The specific phrase list is as follows:

BCP14 = [
    "MUST NOT", "SHALL NOT", "SHOULD NOT",
    "MUST", "REQUIRED", "SHALL", "SHOULD",
    "RECOMMENDED", "MAY", "OPTIONAL"
]
EXTENDED_BCP14 = [
    "NOT RECOMMENDED", "RECOMMENDS* NOT","RECOMMENDS*",
    "NOT PERMITTED", "PERMITTED", "PERMITS*",
    "NOT REQUIRED", "NOT REQUIRES*", "REQUIRES*",
    "CAN NOT", "COULD NOT", "CAN", "COULD", "MIGHT",
    "NOT SUGGESTED", "SUGGESTED", "SUGGESTS*", 
    "NOT ALLOWED", "ALLOWED", "ALLOWS*",
    "FORBIDDEN", "FORBIDS*", "PROHIBITED", "PROHIBITS*"
    "DEPRECATED", "HAVE TO"
]

Comments on these lists are most welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format
Projects
None yet
Development

No branches or pull requests

3 participants