Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frontmatter syntax #1210

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

EmileTrotignon
Copy link
Collaborator

This adds sexp-syntax for the frontmatter.

There are still things to decide : the exact syntax (there are multiple ways to structure sexps), and the errors messages.

Compatibility with previous versions of OCaml is also not done.

I am not a 100% happy about my abstractions to match on sexps, this require some thinking and I will be happy to get input on that.

I added a dependency to sexplib, which feels reasonable to me, we are using sexps so we need something to parse them (and thats hard to write because of comments and escape sequences).
I also noticed that dune has an sexps parser, which we might be able to use if they export it, and this won't add any dependency. If we get to use their sexp-matching syntax its even better.

@@ -561,12 +561,22 @@ let append_alerts_to_comment alerts
let ast_to_comment ~internal_tags ~sections_allowed ~tags_allowed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ast_to_comment function does two orthogonal things:

  • turning the parser AST into an odoc AST
  • turning the parser frontmatter into an odoc frontmatter
    while the two are not dependent on each other.

This results in this function being called multiple times with a made up empty frontmatter, followed by an ignore of the output frontmatter.

To me, this suggests to have two functions: ast_to_comment (put back to its original version) and sexp_to_frontmatter (or some other name) for the frontmatter.

@dbuenzli
Copy link
Contributor

There are still things to decide : the exact syntax (there are multiple ways to structure sexps)

That's the problem with s-expressions. Honestly I don't think you should use s-expression. They have no standard. If you really think we need something beyond simple key: value I'd rather have JSON at that point.

Also I did not think a lot about but did people think about simply using OCaml attributes ? This bring back .mld files nearer to .mli files and may be otherwise useful to inform odoc from .mli where the preamble may not be doable for all sorts of reasons (e.g. copyright headers, etc.). Something like:

[@@@odoc.title "…"]
[@@@odoc.children "…"]

@@ -209,35 +236,35 @@ let%expect_test _ =
let module Trivial = struct
let empty =
test "";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to have test print something very special in case there is a non-empty frontmatter, and otherwise use the previous output?
And another test_with_frontmatter which output the current version of test.

It would prevent having multiple empty parentheses which makes it hard to read in 99% of the cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@jonludlam
Copy link
Member

That's the problem with s-expressions. Honestly I don't think you should use s-expression. They have no standard. If you really think we need something beyond simple key: value I'd rather have JSON at that point.

I take your point, though I suspect we can rely on https://github.com/janestreet/parsexp to be our definition of the 'standard' we adhere to for the question of 'what is a valid sexp'.

Also I did not think a lot about but did people think about simply using OCaml attributes ? This bring back .mld files nearer to .mli files and may be otherwise useful to inform odoc from .mli where the preamble may not be doable for all sorts of reasons (e.g. copyright headers, etc.). Something like:

[@@@odoc.title "…"]
[@@@odoc.children "…"]

I'm not convinced this is better than a simple sexp. From the point of view of parity between mli and mld files I'd not be unhappy with:

[@@@odoc.frontmatter "(title Foo)"]

Going back to what this PR implements, I think I'd prefer the dune style over jbuilder style, that is, rather (children foo bar baz) than (children (foo bar baz)).

@dbuenzli
Copy link
Contributor

dbuenzli commented Oct 11, 2024

I take your point, though I suspect we can rely on https://github.com/janestreet/parsexp to be our definition of the 'standard' we adhere to for the question of 'what is a valid sexp'.

Well that's not a standard by any means, there's not even a formal grammar there :-)

But more importantly it doesn't tell me how you encode dictionaries in sexps. Once you really start working with them you realize there's no such as thing as as "simple" s-expression: no one agrees on how to encode dictionaries which is exactly what your last sentence is about… We end up with "preferences", the particular one you propose being a pretty bad one from a generic s-expression processing perspective (see the linked comment).

Before adding the complexity of a full blown, ill-defined and user-unfriendly serialisation format, could we perhaps start by listing exactly what is supposed to go in these front-matter for now ?

Honestly I think that a couple of @@@odoc.* attributes, which are naturally well delimited, to derive a key-value map will perfectly do. Bonus point, your syntax highlighters and other tools like ppx for OCaml, will naturally work on them.

@EmileTrotignon
Copy link
Collaborator Author

I will never remember exactly how many @ i am supposed to use with the syntax you propose. And I have been using OCaml fulltime for multiple years, so I imagine relative newcomers will have even more issues. Because of this I am very much against using attribute syntax.
As for sexp, I am fine with them. They might be a bit tricky, but because of dune, any beginner will be familiar with them (when beginners do not have to use attributes at all until they need them).
I am in favor of sexps with the dune style, just because its used in dune and people will be familiar with it. Conceptually (a (b c)) is probably better, but I do not think this is a big concern here. Users being familiar with the frontmatter syntax is.

@dbuenzli
Copy link
Contributor

I will never remember exactly how many @ i am supposed to use with the syntax you propose.

This is a straw man. Soon or later you will disable a deprecation warning and you will learn. At that point you will be so happy to discover that there is actually some form of consistency in the design of the eco-system. And even if you don't learn, it will be just one example away, it's not as if you are going to write these every day.

They might be a bit tricky, but because of dune, any beginner will be familiar with them (when beginners do not have to use attributes at all until they need them).

Newcomers are unlikely to be familiar with them, most will actually struggle with them. Newcomers would much prefer to have JSON or whatever other configuration format à la mode they are familiar with in their dune file. Newcomers want what they know (not always rightfully), so let's skip over that "beginner friendly" argument aswell.

A reasonable assumption it so say that proficient dune users will be familiar with them. But will those still be in 10 or 20 years ?

I really don't think it's a good idea to add to the ocamldoc language an odd, non standard, s-expression syntax, used as the current configuration syntax, by one of the OCaml build tools, however dominant it may be now.

Other than that, this question:

Before adding the complexity of a full blown, ill-defined and user-unfriendly serialisation format, could we perhaps start by listing exactly what is supposed to go in these front-matter for now ?

remains unanswered.

@EmileTrotignon
Copy link
Collaborator Author

This is a straw man.

I do not see how that is a straw man. I have not misrepresented someones argument.

Soon or later you will disable a deprecation warning and you will learn.

I have used attributes before. I just do not remember the exact number of @ to use every time because it changes depending on the context and there are three option that I have seen : @, @@ and @@@. I am sure there are good reasons for such complexity, but it is very confusing.

Newcomers are unlikely to be familiar with them, most will actually struggle with them.

They might although I do not remember struggling that much, but my argument was that they will have to learn them anyways, which is not the case for attributes which are an advanced feature that you do not need that often.

Newcomers would much prefer to have JSON or whatever other configuration format à la mode they are familiar with in their dune file.

Most likely yes, but they don't have that, so they learn the dune syntax instead.

however dominant it may be now.

Its dominance is very relevant to the argument I am making. People are familiar with the format because dune is dominant. I care about the familiarity.

what is supposed to go in these front-matter for now ?

This PR is concerned with having a list of children pages. We will want to add other things later so we cannot just add a list separated by line returns. I do not know what the other stuff is, @jonludlam should be able to say more about this.

@dbuenzli
Copy link
Contributor

dbuenzli commented Oct 11, 2024

I do not see how that is a straw man. I have not misrepresented someones argument.

It's a straw man to the discussion. As the user of the language you will have to learn about attributes anyways whether your remember how much @ you need to put or not (also when you design, do not design for newcomers).

In general I rather have less things to learn for mastering the language, with this proposal you are forcing me to learn a new separator (--- IIUC) and dune's odd configuration format. The latter may well change at some point even if dune remains the dominant system. At that point you will be sorry to have that format enshrined in odoc's preambles. The attribute syntax is part of the OCaml language so it's less likely to change in the future.

Also there is a discussion upstream to add unit headers to source files. Before you go on inventing your own stuff, consistency with it is likely something that should be considered.

Its dominance is very relevant to the argument I am making. People are familiar with the format because dune is dominant. I care about the familiarity.

Familiar doesn't mean it's good (and is not a good measure for usability). Dominant even less…

This PR is concerned with having a list of children pages. We will want to add other things later so we cannot just add a list separated by line returns.

While I can see that you would not like to have a simple line based key: value format which always end up bringing problems at some point (and which attributes, being well delimited, wouldn't have), introducing this PR for that seems totally overkill to me.

@dbuenzli
Copy link
Contributor

dbuenzli commented Oct 14, 2024

Also another existing syntax that could be simply reused (an which is already parsed by odoc) is of course @-tags.

So this could simply be leading @children @title ocamldoc directives, cue @author which already exists. For .mli then this could be simply those that are leading in the first doc comment.

@EmileTrotignon
Copy link
Collaborator Author

In general I rather have less things to learn for mastering the language

That makes sense

with this proposal you are forcing me to learn a new separator (--- IIUC) and dune's odd configuration format.

For the separator, this stands for everyone, but for the dune syntax you are an outlier. The fact that you write documentation of a lot of important libraries matters in this, but I am pretty sure most users learn the dune syntax long before attributes.

The latter may well change at some point even if dune remains the dominant system.

If they can change something as big as their config language, I do not see why we could not also change later on.

I also don't think the current proposition is remotely as bad as you say it is. Sexps might not have an agreed upon way to describe dictionaries, but we can just pick one. ( children [child]* ) is not a crazy grammar at all.

@dbuenzli
Copy link
Contributor

dbuenzli commented Oct 14, 2024

If they can change something as big as their config language, I do not see why we could not also change later on.

Because this is a short term view.

You should treat the ocamldoc language the way the OCaml language is: a very stable language. In fact if upstream did design a language rather than engineer language features, it would deeply care about it and be an integral part of its definition.

There is absolutely no need to integrate dune's configuration format in ocamldoc's language. We have already enough syntax both in OCaml and ocamldoc to allow for what's needed here.

( children [child]* ) is not a crazy grammar at all.

Neither is @children [child]*. Bonus points: you did not invent anything new, your are fully consistent with other existing directives that odoc interprets and finally, it is agnostic to the syntax of the configuration language of your build system.

Honestly now that I gave a good thought to the proposal of this PR, it looks rather ridiculous. It's completely ignoring what already exists in the language.

@EmileTrotignon
Copy link
Collaborator Author

You should treat the ocamldoc language the way the OCaml language is: a very stable language.

As far as I know, dune is also quite stable and cares a lot about retro-compatibility.

Still, now that you showed the syntax for odoc tags, I agree its better (I had not read that comment before writing my previous message).

@panglesd
Copy link
Collaborator

Before adding the complexity of a full blown, ill-defined and user-unfriendly serialisation format, could we perhaps start by listing exactly what is supposed to go in these front-matter for now ?

Currently, I think:

  • children order,
  • short title (title to use in breadcrumbs, instead of the file name (as is currently used) or the full title (may be too long))
  • and opens (for the whole file).

Honestly now that I gave a good thought to the proposal of this PR, it looks rather [CENSORED]

I find the current proposal neither bad nor perfect: it syntactically forces the data to be at the top, separated from the content. Frontmatter are purely mld things. Sexp is just a serialization format, no problem with it (when it is not à la mode anymore, we'll just have a nostalgic feeling when seeing it, thinking how our QRCode based format is better).

I find the tag proposal neither bad nor perfect: It reuses part of the language, but a part that was not very nicely defined (we have three indistinguishable kind of them, with bad behavior in the "block case".
If we go the tag way, we will have to extend some of their syntax anyway: specifying the children order in a single line will be awful with many pages. So it cannot be a "line tag", but it cannot be a "block tag" either: we don't want all the rest of the .mld to be a list of children.

@dbuenzli
Copy link
Contributor

Thanks @panglesd for the list.

Then yes I think that importing any full blown serialisation format into ocamldoc for specifying these three things is inadequate and user unfriendly1.

This especially since the ocamldoc language already has syntax that allows us to specify directives, that users will have to learn about it anyways (e.g.@inline, @open, etc.) and which are literally screaming (can you hear them ?) to be further used for the items you mention.

I'm not interested in nostalgic feelings when I read an .mld file, I'm interested with dealing with a language that looks at the future and is consistently designed. Not something that looks like an arbitrary mash of different syntaxes and data formats.

If we go the tag way, we will have to extend some of their syntax anyway: specifying the children order in a single line will be awful with many pages.

If you look at the issue it rather looks like it's going to be a fix rather than an extension.

Just define block tags as @lpw25 suggested in the linked issue, i.e. stops at the next @tag or after a blank line. Then @children simply becomes a block @tag and you will have killed an odoc issue along the way.

Footnotes

  1. And mind you I'm not against using s-expressions for configuration because I'm not using dune but because I have used them myself for that in a few tools and determined that they were not generally adequate for the task for reasons already mentioned in this discussion.

@EmileTrotignon
Copy link
Collaborator Author

I agree using this opportunity to fix tags is a good idea. I am not sure using a blank line to do that is the way to go though. I think the intent of block tags is to allow any odoc syntax in them, and you might want empty lines in there.
Some sort of bracket could do it, maybe {| ... |}

@dbuenzli
Copy link
Contributor

I think the intent of block tags is to allow any odoc syntax in them, and you might want empty lines in there.

This is likely better discussed on the actual issue. But I tend to agree with what @lpw25 mentioned there:

Perhaps we should take a blank line as ending the tag? I think the actual tags we support would be fine only taking a single paragraph as input.

Or perhaps generalizing a bit: a block @tag simply marks a single block level construct of the ocamldoc language. In order words the list of nestable_block_elements becomes singletons here.

If people really still want block tags to span multiple blocks we could also add a convention that if you want multiple blocks you seperate them by a lone @ on its own line (rather than a blank line).

@jonludlam
Copy link
Member

We had a discussion about this during the recent odoc dev meeting and have come to the following
agreement.

  1. We'll use tags for the 3 metadata items that have been identified so far.
  2. The tags we'll use will be implemented using a new custom tag parsing, which will allow the use of arbitrarily named @tags in odoc docs. As part of this we'll change the current tag parsing such that the content of block tags will be terminated in a more sensible way - ie, certainly by an empty line,
    and possibly other ways. I did a survey of the current uses of block tags and concluded that more will be fixed by this change than broken.
  3. As more metadata is added to the content we'll keep open the possibility of a syntactically separate frontmatter, but we won't do that yet.
  4. These will be required to be at the very top of the document, ie, before the first heading.

I was thinking of having 'namespaced' tags for this, in the spirit of attributes, so the tags we use would be @odoc.children, @odoc.shortname and @odoc.open.

Does this sound OK?

@dbuenzli
Copy link
Contributor

I think this looks excellent.

Regarding this:

I was thinking of having 'namespaced' tags for this, in the spirit of attributes, so the tags we use would be @odoc.children, @odoc.shortname and @odoc.open.

I don't have a strong opinion but I feels a bit redundant to me. I'm not sure why upstream decided it was good to have ocaml. prefixed everywhere in attributes. In my opinion you can use the root space for the ocamldoc tool which nowadays is odoc and let third-party tag/attribute designers namespace their own tags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants