Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify and update CF rules for deprecating content #328

Open
ethanrd opened this issue May 20, 2021 · 13 comments
Open

Clarify and update CF rules for deprecating content #328

ethanrd opened this issue May 20, 2021 · 13 comments
Labels
governance Changes to the rules for modifying the conventions documents

Comments

@ethanrd
Copy link
Member

ethanrd commented May 20, 2021

The paragraph in the CF rules document that discusses deprecation is focused on recent (or even the most recent) changes and versions. The rules for deprecation should be updated and clarified for other situations. For instance, a deprecation in issue #314 impacts text that has been part of CF since version 1.0.

@JonathanGregory
Copy link
Contributor

Dear @ethanrd

When we added that part to the rules, this situation had never arisen. I believe this is the first time. As I said in #314, I think that the aim of deprecation should be to discourage faulty data from being written, and it should be the minimal recommendation that would achieve that effect.

Cheers

Jonathan

@JonathanGregory
Copy link
Contributor

Dear Klaus et al.

@zklaus commented as follows in #314:

I was a bit confused by how the term deprecation was used by @davidhassell and @JonathanGregory, so I searched for it in the issue, finding that I myself introduced it here. Allow me to clarify how I understand it.

Deprecation doesn't apply to versions of an artifact, be it a software package or a standards document. Rather, it applies to specific features. What it says is: "We think this feature should not be used going forward. To allow for a transition period, we do not remove it at this point in time, so you can still use it for a bit, but we'd rather you don't, and we want to remove it in a later version." In my mind, it does not retroactively declare past versions wrong, and writing a new file today that declares that it follows the CF conventions version 1.6 is perfectly legal, if ill-advised.

Independent of any deprecation, we might want to have a recommendation to always use the latest version of CF available for new developments.

Maybe we didn't use the word "deprecation" correctly in the rules, where we wrote:

If the change, once implemented in the conventions, subsequently turns out to be materially flawed, meaning that data written following the convention could be somehow erroneous or ambiguous, a github issue should urgently be opened to discuss whether to revoke the change. If this is agreed by a majority of the committee, a new version of the conventions will be prepared immediately, with the second digit of the version number incremented, and will be recommended to be used instead of the flawed version. The flawed version will be deprecated by a statement in the standard document and the conformance document. However, any data written with the flawed version will not be invalidated, although it may be problematic for users.

As Ethan has said, in this case there is no change to be revoked - we hadn't anticipated that we would discover an error that affects all existing versions! Nonetheless, the principle ought to apply. Data written with the flawed versions (all existing ones) is still legal, but might be problematic, so we want to minimise the use of these versions for new data. In my opinion, we are saying retroactively that all these versions are wrong - not everything which is legal is right, after all! To deprecate something means to express disapproval of it. That is what we are doing. We disapprove of all these versions, but only as regards the specific feature we are correcting. Hence my proposal that we deprecate the versions <1.9 in this feature only.

Best wishes

Jonathan

@ethanrd
Copy link
Member Author

ethanrd commented May 21, 2021

Hi all - As I mentioned in issue #314, there are a number of deprecations in the current CF specification. Two involve backwards compatibility with COARDS and have been in CF since version 1.0, one of these involves non-compliance with Udunints and the other with temporary(?) deprecation in the NUG. The rest are more recent changes.

Here’s the list of the deprecations currently found in CF 1.9-draft:

  • “The units level, layer, and sigma_level are allowed for dimensionless vertical coordinates to maintain backwards compatibility with COARDS. These units are not compatible with Udunits and are deprecated …”

  • The standard name modifiers number_of_observations and status_flag are deprecated in favor of standard names number_of_observations and status_flag

  • The use of projection_x_coordinate and projection_y_coordinate standard names are deprecated for use with the “geostationary” grid mapping. Instead projection_x_angular_coordinate and projection_y_angular_coordinate standard names should be used.

  • The use of standard_parallel and scale_factor_at_projection_origin attributes are marked as deprecated (without explanation) for use with the “lambert_cylindrical_equal_area” grid mapping

  • The missing_value variable attribute - originally allowed for backward compatibility with COARDS but deprecated by NUG and then reinstated in later versions

    • CF 1.0 through 1.6, considered deprecated by NUG but allowed in CF for backward compatibility with COARDS.
    • CF 1.7 reinstates the use of the missing_value variable attribute
      • No discussion of why. Was there a NUG update?
    • See two items in Section “Revision History”

@davidhassell
Copy link
Contributor

Hello,

Is it right that the deprecations that Ethan lists (#328 (comment)) are still allowed? i.e. these are not wrong, but are discouraged. This is a different situation to #314, for which the formula terms was wrong and it's old form is, from this time onwards, disallowed (is that the right word?) when writing CF<=1.8.

Perhaps we need an appendix to summarize this sort of information - deprecations and errors - as well as in the relevant parts of the text, for maximum visibility (I have a feeling this has already been suggested - but I can't find where!).

@JonathanGregory
Copy link
Contributor

Dear @ethanrd and @davidhassell

I would say that the conformance document should provide our definitive list of deprecations. A "deprecation" there is a recommendation not to do it; the CF checker gives a warning about any recommendation that can be checked and isn't followed. Any deprecations that are mentioned in the text should be in the conformance document. Maybe not all of those Ethan has detailed are in it, but they should be, I would argue. The first one is there, for example (in section 3.1 of conformance). Not all of them can be checked automatically, or not easily, but they should still be stated anyway, I think.

The deprecation of flawed versions, like in #314, is different. In this case, we have identified an error in the convention, which allows metadata to be written that can't be interpreted reliably. In the other cases, there's nothing actually wrong, and the recommendations are made with the aim of writing metadata which is easier to use in some way.

Best wishes

Jonathan

@ethanrd
Copy link
Member Author

ethanrd commented May 24, 2021

I agree the deprecation in #314 is different than those currently in the specification. Perhaps deprecation is not the right word for this usage. Maybe instead errata or corrigenda?

Whatever words we use, I'm not sure CF should (or how it would) "disallow" the #314 deprecation in earlier versions of CF. Any existing data written using this feature would have been conforming at the time it was written. To now make that data non-conforming does not seem right. On the other hand, a simple warning does not seem enough.

Perhaps CF needs a few categories of deprecations:

  • Current CF Deprecations: still allowed but discouraged (with a conformance warning in all future CF versions). Never to be disallowed.
  • Standard Deprecation: still allowed but discouraged (with a conformance warning, future CF versions may raise an error).
  • Error CF Deprecation: disallowed in future versions (with a conformance error). Back-ported to earlier CF versions, very strongly discouraged (with a conformance [very strong??] warning).

(I'm not sure this really helps as what does "very strong error" really mean. Written in all caps and bold text?)

@JonathanGregory
Copy link
Contributor

Dear @ethanrd

I would favour simplicity. At the moment, we have two categories in the conformance document. (1) Recommendations to do something. A recommendation not to do something is a deprecation. The CF checker gives warnings for recommendations (including deprecations) that are not met. (2) Requirements to do something. A requirement not to do something is a prohibition. The CF checker gives errors for requirements (including prohibitions) which are not met. I feel that this is sufficient, provided we make sure all the recommendations and requirements in the standard are included in the conformance document. We can't foresee what will happen in future versions of the convention.

Maybe we should make the #314 deprecation into a prohibition instead i.e. disallow (as you say) the flawed old version of "sigma over z"? Then the checker would give an error if it detects it. We could make clear in stating the requirement that it applies to new data (from now on), and does not invalidate existing data (although such data may unavoidably be problematic).

Best wishes

Jonathan

@erget
Copy link
Member

erget commented May 26, 2021

TLDR: My opinion is that the rules are sound for correcting errata, but we do not describe what to do in the case of deprecation. This may not be necessary because we could consider deprecation normal care and feeding for the standard. I do agree that we should have a list of deprecations and the CF Checker should warn of deprecated features. We could consider deciding upon and announcing versions at which we will remove deprecated features.

Musings:
I too prefer simplicity, when it's possible. Would it be possible to use the deprecation mechanism differently for 2 classes of items?:

  1. Errata
  2. Discontinued features

My reasoning here is that we would remove these items from the standard with different speeds.

As @JonathanGregory highlights, there is a procedure for errata (abridged by me):

If the change... turns out to be ... flawed, ... a new version of the conventions will be prepared immediately, with the second digit of the version number incremented, and will be recommended to be used instead of the flawed version. The flawed version will be deprecated by a statement in the standard document and the conformance document. However, any data written with the flawed version will not be invalidated, although it may be problematic for users.

This remains reasonable in my mind, and it would be used in the case that quick action is needed.

Then there are deprecations like @ethanrd notes - I'll speak to the use of projection_x_coordinate and projection_y_coordinate in conjunction with the geostationary grid mapping, as I was involved in that one. That is a feature discovered to be erroneous and deprecated by us. However, we've been living with this error for many years now and nobody complained - the deprecation was due to an (over-) abundance of precision. Thus we have not yet set a date at which mention of those attributes will be completely removed from that part of the standard; it is not urgent.

We could adopt the same approach if we ever have a feature that is overly complex and we no longer want to support the use of it - I'm not advocating this, but for the sake of argument let's say it's the use of packed data described in Section 8.1. In that case, we could deprecate the feature, and even inform users with e.g. 2 releases of notice that it will at some point disappear from the standard. In this case it would not be due to an error, but we could treat it the same way.

@zklaus
Copy link

zklaus commented May 26, 2021

I agree with what @ethanrd and @erget said, namely that we have errata and what I would call deprecations.

I think it is quite important to actually remove deprecations at some point, preferably under a predictable policy, e.g. two versions after the initial deprecation. The reason is that these features really become a burden on producers of CF tooling, like libraries for reading CF files. If we essentially allow everything indefinitely, it becomes increasingly difficult to produce a conforming implementation. This really is worse in the case of deprecations and errata than with normal evolution, because deprecation often seems to happen together with a new formulation taking the place of the deprecated feature. That, in turn, implies that as a library maintainer now you need to take care of two different formulations of the same phenomenon, of which one is known to be bad in some sense.

If somebody really needs to rely on an old feature, they are always free to write a file according to the old standard and put the corresponding ":conventions" attribute inside.

@JonathanGregory
Copy link
Contributor

Dear all

I'd like to repeat my earlier points that

  • We should make use of the existing list, namely the conformance document, for the purposes being discussed here - I don't think we need a new list.

  • We don't have to distinguish positive and negative categories, because they are logically related: prohibition = negative requirement, deprecation = negative recommendation.

Does anyone disagree with those points, I wonder?

Which of these categories should be used if we discover a flaw in the convention, which allows metadata to be produced that can't be interpreted correctly or reliably (as in #314)? The rules say "deprecate" but I think now that's too weak. We should prohibit the use of the flawed convention (not the whole version, just the affected part) for writing new data (but also reassure users that existing data isn't being invalidated).

I appreciate the arguments about the need for a further distinction, and I agree this could help in other cases. I suggest that we need to distinguish between recommendations which are made for good practice (and could remain for ever), and recommendations which are made because there are alternative ways to do something where one is preferred and the other might be abolished in future. That is, we would have three categories in the conformance document, rather than two. An example of a good-practice recommendation in the conformance document is "The name of a multidimensional coordinate variable should not match the name of any of its dimensions." We do not envisage making this a requirement.

In general, we do not try to foresee the future of the CF convention, so I think most of the current recommendations are for good-practice. There should only be a few where we think it's really likely that we are going to abolish something in future. I note that, up to now, we have not abolished anything. One reason for that is because past data continues to exist for a long time. Therefore it's hard to withdraw support for any feature in data-reading programs without causing inconvenience, although you can in data-writing programs. I know that you can always inspect the Conventions attribute, but most user programs don't pay attention to that, I imagine, and we should avoid making things awkward for users.

Hence I would suggest identifying which current recommendations in the conformance document, or which should be in it but aren't, ought to be promoted to a new category of things which really might become requirements/prohibitions in future. What could this new category be called? Warnings?

Best wishes

Jonathan

@zklaus
Copy link

zklaus commented May 27, 2021

* We should make use of the existing list, namely the conformance document, for the purposes being discussed here - I don't think we need a new list.

I agree with this.

* We don't have to distinguish positive and negative categories, because they are logically related: prohibition = negative requirement, deprecation = negative recommendation.

I don't understand this.

Which of these categories should be used if we discover a flaw in the convention, which allows metadata to be produced that can't be interpreted correctly or reliably (as in #314)? The rules say "deprecate" but I think now that's too weak. We should prohibit the use of the flawed convention (not the whole version, just the affected part) for writing new data (but also reassure users that existing data isn't being invalidated).

I appreciate the arguments about the need for a further distinction, and I agree this could help in other cases. I suggest that we need to distinguish between recommendations which are made for good practice (and could remain forever), and recommendations which are made because there are alternative ways to do something where one is preferred and the other might be abolished in the future. That is, we would have three categories in the conformance document, rather than two. An example of a good-practice recommendation in the conformance document is "The name of a multidimensional coordinate variable should not match the name of any of its dimensions." We do not envisage making this a requirement.

In general, we do not try to foresee the future of the CF convention, so I think most of the current recommendations are for good practice. There should only be a few where we think it's really likely that we are going to abolish something in the future. I note that, up to now, we have not abolished anything. One reason for that is because past data continues to exist for a long time. Therefore it's hard to withdraw support for any feature in data-reading programs without causing inconvenience, although you can in data-writing programs. I know that you can always inspect the Conventions attribute, but most user programs don't pay attention to that, I imagine, and we should avoid making things awkward for users.

This may very well be a chicken-and-egg problem. After all, if I have to support every feature since the first version anyway, why check the version number? It seems to me that this approach becomes less and less tractable as the complexity of the conventions increases.

Hence I would suggest identifying which current recommendations in the conformance document, or which should be in it but aren't, ought to be promoted to a new category of things which really might become requirements/prohibitions in the future. What could this new category be called? Warnings?

Continuing this line of thought and drawing further analogy with version numbers in software packages, a flaw that leads to incorrect or uninterpretable metadata could be considered a bug and as such warrant the release of a bug-fix version of the conventions. This could mean releasing version 1.7.1, 1.8.1, and 1.9.1 all at the same time, only changing the relevant bug but otherwise not impacting the feature set of the corresponding version. To avoid undue burden and maintenance of long-obsolete versions one would probably want to declare only a very limited set of versions as supported, say the last two.

This way, old versions can be updated to fix manifest flaws, which makes it also more plausible to actually retire deprecated features because a user that relies on such a feature can find comfort in knowing that the old version he now must use does not fall quickly into disrepair.

User software, which already increasingly is built on standard libraries for interacting with CF data, can check the conventions attribute in a meaningful way and support different versions of the conventions even when the feature sets differ or offer different best practice implementations for the same encoding requirement.

@JonathanGregory is of course correct that all of this goes far beyond #314, but that is how I understood the intent of @ethanrd in opening this issue.

@JonathanGregory
Copy link
Contributor

Dear @zklaus

We don't have to distinguish positive and negative categories, because they are logically related: prohibition = negative requirement, deprecation = negative recommendation.

Sorry to be unclear. I meant by this to explain that the conformance document has only two categories, viz. recommendation and requirement, and that we don't need to have deprecation and prohibition as two more, because a prohibition is a kind of requirement and a deprecation is a kind of recommendation. Does that make sense?

I agree it is possible that we should remove some features when there is an alternative better way, but I think there has to be a strong case for it. Personally, being mostly a data-analyst and data-producer, I think we should favour making it attractive and easy for data-readers and data-writers to use the conventions (correctly, of course, and not carelessly), even if that makes a bit more work for authors of software. That is because writers of widely used software are generally software experts, whereas producers and analysts of data are generally not software experts.

Again I would suggest that a helpful way to proceed would be to identify any current deprecations (recommendations against doing things) in the conformance or conventions document where there could be a strong case for removing some feature in future. Then we will see how large an issue it is, which will help us to decide how to deal with it.

Best wishes

Jonathan

@ChrisBarker-NOAA
Copy link
Contributor

Picking this up again:

from above:

we don't need to have deprecation and prohibition as two more, because a prohibition is a kind of requirement and a deprecation is a kind of recommendation.

Yes, but deprecation is a very specific kind of recommendation -- it recommends you don't use the thing that's deprecated, but it very explicitly states that it might go away, at some time in the future.

I think:

"this isn't a good idea"

is different than

"this isn't a good idea, and it will not work in the future" -- even if "future" isn't currently defined.

But most critically, if we EVER want to be able to remove anything some day (CF 2) -- we will need to be able to clearly define that there's been a deprecation period -- be able to say "you were warned".

And even if we have no idea when that breaking change will actually occur, it would be good to start preparing now -- if there's a feature we'd like to remove, let's deprecate it now.

So yes:

a helpful way to proceed would be to identify any current deprecations (recommendations against doing things) in the conformance or conventions document where there could be a strong case for removing some feature in future.

That's the next step.

@JonathanGregory JonathanGregory added the governance Changes to the rules for modifying the conventions documents label Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
governance Changes to the rules for modifying the conventions documents
Projects
None yet
Development

No branches or pull requests

6 participants