Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Footnotes extension #332

Merged
merged 39 commits into from
Sep 12, 2024
Merged

Footnotes extension #332

merged 39 commits into from
Sep 12, 2024

Conversation

robinst
Copy link
Collaborator

@robinst robinst commented Jul 7, 2024

This adds a new extension commonmark-ext-footnotes (class org.commonmark.ext.footnotes.FootnotesExtension) to implement footnotes syntax as in GitHub Flavored Markdown (see docs). Fixes #273.

An example:

Some text with a footnote[^1].

[^1]: The text of the footnote.

The [^1] is parsed as a FootnoteReference, with 1 being the label.
The line with [^1]: ... is a FootnoteDefinition, with the contents as child nodes (can be a paragraph like in the example, or other blocks like lists).

Apart from the parsing, the extension also comes with rendering of footnotes for HTML and Markdown.

Extension mechanisms

In order to implement this as a separate extension, the following APIs were added to commonmark core:

  • DefinitionMap: New class for storing and looking up definitions by a label, with label normalization as for link reference definitions
  • BlockParser: New method getDefinitions that can be implemented to return definitions that can later be accessed during inline parsing (the built-in ParagraphParser also uses that mechanism now; previously it was a special case in the parser)
  • LinkProcessor: New interface that can be implemented to customize link/image processing. This is used to turn [^1] into FootnoteReference nodes.
  • NodeRenderer: New methods beforeRoot and afterRoot that are called before/after rendering a document; used to render footnotes at the end of the document

Alternatives considered

PostProcessor

Could footnote reference parsing have been implemented as a PostProcessor step after inline parsing? No, because a foonote reference like [^*foo*] would have been turned into emphasis by inline parsing, whereas footnote parsing needs the raw *foo* as a label.

InlineContentParser

I considered using the recently-added inline parsing customization API, using [ as the trigger character. That would work for simple cases, but not for others. E.g. in this:

[^foo](/url)

[^foo]: note

That is not a footnote followed by (/url), but instead it's an inline link. In other words, if parsing as a link is possible, that is preferred.

That means our custom inline parser for [ would have to be able to parse the full link syntax in order to give preference to links, which is quite tricky. In addition to that, it would have have to trigger on !, for a footnote like ![^foo], which normally would be parsed as an image node.

So that's what LinkProcessor solves: It keeps the tricky link parsing in the inline parser, but allows extensions to decide to treat certain things not as links, but different types of nodes, or maybe even parse things that come after a link (e.g. image attributes could be implemented on top of this).

See `_scan_footnote_definition` in cmark-gfm.
It started out limited but now it covers all types of links/images,
knows about destination and title, etc.
This turned out to be tricky, and GitHub gets some of it wrong.
If anyone ever wants us to be bug-compatible, it should be relatively
straightforward to emulate GitHub by just running the initial reference
search over everything (including definitions) and then not bothering
with finding more at the end.
@robinst
Copy link
Collaborator Author

robinst commented Jul 7, 2024

I've found some interesting behaviors (bugs) in GitHub's implementation while working on this. E.g.:

[^1]: One [^2]
[^2]: Two

That shouldn't render anything, because only 2 is referenced but from another footnote that is not rendered. But GitHub renders "Two" as the only footnote, with a "back" link pointing nowhere.

Another one:

[^1]: One [^2]

Test [^1]

[^2]: Two

The order of footnotes should be One (referenced first in the text), then Two (referenced from a footnote). But GitHub renders Two first (because it finds the [^2] reference first).


I've decided not to follow GitHub's implementation for these edge cases, but instead go for the nicest result. See docs on FootnoteHtmlNodeRenderer.java for more about this. If bug-for-bug-compatibility is required at some point it should be simple enough to add as an option.

@robinst robinst mentioned this pull request Jul 7, 2024
The resulting implementation for the footnotes extension is much nicer.
It also cleans up LinkInfo and makes images less of a special case.

Additionally, this allow inline parsing of markers that are not
part of links - could have done this without this change but noticed
it here and decided to fix it.
Copy link

codecov bot commented Sep 6, 2024

Codecov Report

Attention: Patch coverage is 96.62921% with 9 lines in your changes missing coverage. Please review.

Project coverage is 95.01%. Comparing base (591b452) to head (e3e38ef).
Report is 40 commits behind head on main.

Files with missing lines Patch % Lines
...java/org/commonmark/internal/InlineParserImpl.java 95.48% 1 Missing and 5 partials ⚠️
...c/main/java/org/commonmark/node/DefinitionMap.java 86.66% 2 Missing ⚠️
...g/commonmark/internal/InlineParserContextImpl.java 87.50% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #332      +/-   ##
============================================
- Coverage     95.05%   95.01%   -0.05%     
  Complexity      254      254              
============================================
  Files           131      136       +5     
  Lines          4185     4350     +165     
  Branches        600      617      +17     
============================================
+ Hits           3978     4133     +155     
- Misses          111      116       +5     
- Partials         96      101       +5     
Files with missing lines Coverage Δ
...src/main/java/org/commonmark/internal/Bracket.java 100.00% <100.00%> (ø)
...main/java/org/commonmark/internal/Definitions.java 100.00% <100.00%> (ø)
...n/java/org/commonmark/internal/DocumentParser.java 98.18% <100.00%> (ø)
...onmark/internal/LinkReferenceDefinitionParser.java 97.03% <ø> (ø)
.../java/org/commonmark/internal/ParagraphParser.java 100.00% <100.00%> (ø)
.../commonmark/internal/inline/CoreLinkProcessor.java 100.00% <100.00%> (ø)
...org/commonmark/internal/inline/LinkResultImpl.java 100.00% <100.00%> (ø)
.../commonmark/internal/renderer/NodeRendererMap.java 100.00% <100.00%> (ø)
...onmark/src/main/java/org/commonmark/node/Link.java 80.00% <ø> (ø)
...onmark/src/main/java/org/commonmark/node/Node.java 96.00% <ø> (ø)
... and 9 more

@robinst robinst merged commit c910105 into main Sep 12, 2024
11 of 12 checks passed
@robinst robinst deleted the footnotes-extension branch September 12, 2024 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for footnotes
1 participant