Footnotes extension #332

robinst · 2024-07-07T13:15:20Z

This adds a new extension commonmark-ext-footnotes (class org.commonmark.ext.footnotes.FootnotesExtension) to implement footnotes syntax as in GitHub Flavored Markdown (see docs). Fixes #273.

An example:

Some text with a footnote[^1].

[^1]: The text of the footnote.

The [^1] is parsed as a FootnoteReference, with 1 being the label.
The line with [^1]: ... is a FootnoteDefinition, with the contents as child nodes (can be a paragraph like in the example, or other blocks like lists).

Apart from the parsing, the extension also comes with rendering of footnotes for HTML and Markdown.

Extension mechanisms

In order to implement this as a separate extension, the following APIs were added to commonmark core:

DefinitionMap: New class for storing and looking up definitions by a label, with label normalization as for link reference definitions
BlockParser: New method getDefinitions that can be implemented to return definitions that can later be accessed during inline parsing (the built-in ParagraphParser also uses that mechanism now; previously it was a special case in the parser)
LinkProcessor: New interface that can be implemented to customize link/image processing. This is used to turn [^1] into FootnoteReference nodes.
NodeRenderer: New methods beforeRoot and afterRoot that are called before/after rendering a document; used to render footnotes at the end of the document

Alternatives considered

PostProcessor

Could footnote reference parsing have been implemented as a PostProcessor step after inline parsing? No, because a foonote reference like [^*foo*] would have been turned into emphasis by inline parsing, whereas footnote parsing needs the raw *foo* as a label.

InlineContentParser

I considered using the recently-added inline parsing customization API, using [ as the trigger character. That would work for simple cases, but not for others. E.g. in this:

[^foo](/url)

[^foo]: note

That is not a footnote followed by (/url), but instead it's an inline link. In other words, if parsing as a link is possible, that is preferred.

That means our custom inline parser for [ would have to be able to parse the full link syntax in order to give preference to links, which is quite tricky. In addition to that, it would have have to trigger on !, for a footnote like ![^foo], which normally would be parsed as an image node.

So that's what LinkProcessor solves: It keeps the tricky link parsing in the inline parser, but allows extensions to decide to treat certain things not as links, but different types of nodes, or maybe even parse things that come after a link (e.g. image attributes could be implemented on top of this).

…rsing)

See `_scan_footnote_definition` in cmark-gfm.

It started out limited but now it covers all types of links/images, knows about destination and title, etc.

This turned out to be tricky, and GitHub gets some of it wrong. If anyone ever wants us to be bug-compatible, it should be relatively straightforward to emulate GitHub by just running the initial reference search over everything (including definitions) and then not bothering with finding more at the end.

robinst · 2024-07-07T13:36:02Z

I've found some interesting behaviors (bugs) in GitHub's implementation while working on this. E.g.:

[^1]: One [^2]
[^2]: Two

That shouldn't render anything, because only 2 is referenced but from another footnote that is not rendered. But GitHub renders "Two" as the only footnote, with a "back" link pointing nowhere.

Another one:

[^1]: One [^2]

Test [^1]

[^2]: Two

The order of footnotes should be One (referenced first in the text), then Two (referenced from a footnote). But GitHub renders Two first (because it finds the [^2] reference first).

I've decided not to follow GitHub's implementation for these edge cases, but instead go for the nicest result. See docs on FootnoteHtmlNodeRenderer.java for more about this. If bug-for-bug-compatibility is required at some point it should be simple enough to add as an option.

See e.g. https://pandoc.org/MANUAL.html#extension-inline_notes

The resulting implementation for the footnotes extension is much nicer. It also cleans up LinkInfo and makes images less of a special case. Additionally, this allow inline parsing of markers that are not part of links - could have done this without this change but noticed it here and decided to fix it.

codecov · 2024-09-06T14:19:53Z

Codecov Report

Attention: Patch coverage is 96.62921% with 9 lines in your changes missing coverage. Please review.

Project coverage is 95.01%. Comparing base (591b452) to head (e3e38ef).
Report is 40 commits behind head on main.

Files with missing lines	Patch %	Lines
...java/org/commonmark/internal/InlineParserImpl.java	95.48%	1 Missing and 5 partials ⚠️
...c/main/java/org/commonmark/node/DefinitionMap.java	86.66%	2 Missing ⚠️
...g/commonmark/internal/InlineParserContextImpl.java	87.50%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main     #332      +/-   ##
============================================
- Coverage     95.05%   95.01%   -0.05%     
  Complexity      254      254              
============================================
  Files           131      136       +5     
  Lines          4185     4350     +165     
  Branches        600      617      +17     
============================================
+ Hits           3978     4133     +155     
- Misses          111      116       +5     
- Partials         96      101       +5

Files with missing lines	Coverage Δ
...src/main/java/org/commonmark/internal/Bracket.java	`100.00% <100.00%> (ø)`
...main/java/org/commonmark/internal/Definitions.java	`100.00% <100.00%> (ø)`
...n/java/org/commonmark/internal/DocumentParser.java	`98.18% <100.00%> (ø)`
...onmark/internal/LinkReferenceDefinitionParser.java	`97.03% <ø> (ø)`
.../java/org/commonmark/internal/ParagraphParser.java	`100.00% <100.00%> (ø)`
.../commonmark/internal/inline/CoreLinkProcessor.java	`100.00% <100.00%> (ø)`
...org/commonmark/internal/inline/LinkResultImpl.java	`100.00% <100.00%> (ø)`
.../commonmark/internal/renderer/NodeRendererMap.java	`100.00% <100.00%> (ø)`
...onmark/src/main/java/org/commonmark/node/Link.java	`80.00% <ø> (ø)`
...onmark/src/main/java/org/commonmark/node/Node.java	`96.00% <ø> (ø)`
... and 9 more

robinst added 30 commits April 28, 2024 11:58

WIP footnotes: Block parsing

3d5c730

Move code to new ext-footnotes module with extension

1f6e729

Refactor link/image parsing into more manageable pieces

0c17ae8

Make bracket processing extensible via a BracketProcessor

92ef1d0

Allow adding BracketProcessors, use for footnotes extension

7500905

Actually parse child nodes of footnote definition

f30c787

Extract DefinitionMap as a public class

04ba63f

Allow BlockParsers to return definitions (for lookup during inline pa…

4c2f729

…rsing)

Fix replace mode

016eea8

Implement startFromBracket

0606f05

More test cases

aa90ab0

Let full reference links override footnotes

947b1e5

Footnotes: HTML and Markdown rendering

05621db

Use a single map

4616d50

Address TODO for paragraph rendering

b10bd57

Add exclude list for label characters

7fe1a9e

See `_scan_footnote_definition` in cmark-gfm.

Check for indentation

47e622e

Address TODO in bracket processor

194067b

Also use processor for inline links

cbe5925

Set source span, rename methods

982e6a5

Address TODO in markdown renderer

407d0e0

Rename BracketProcessor to LinkProcessor

e619eac

It started out limited but now it covers all types of links/images, knows about destination and title, etc.

Remove unused ReferenceType, add docs

a2258fa

Skip spaces after colon in definition

f028716

Change footnote visiting, move docs to class

5425f63

Adjust Markdown renderer to changed parsing

958048d

Add docs for footnotes extension

0a8c993

Add docs for LinkProcessor

214c195

Documentation tweaks

c68809d

Fix Javadoc

ee7b710

robinst mentioned this pull request Jul 7, 2024

Support for footnotes #273

Closed

robinst added 3 commits September 7, 2024 00:16

Merge remote-tracking branch 'origin/main' into footnotes-extension

e170d31

Add support for inline footnotes

72d6fa7

See e.g. https://pandoc.org/MANUAL.html#extension-inline_notes

robinst added 5 commits September 7, 2024 17:39

Inline footnotes rendering (first part)

257e4a4

Inline footnotes rendering finished

c6b4275

Rename param to allow implementations to use the normal node word

0dfa888

Remove old TODO

de53b03

Javadoc

e3e38ef

robinst merged commit c910105 into main Sep 12, 2024
11 of 12 checks passed

robinst deleted the footnotes-extension branch September 12, 2024 13:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Footnotes extension #332

Footnotes extension #332

robinst commented Jul 7, 2024 •

edited

Loading

robinst commented Jul 7, 2024

codecov bot commented Sep 6, 2024 •

edited

Loading

Footnotes extension #332

Footnotes extension #332

Conversation

robinst commented Jul 7, 2024 • edited Loading

Extension mechanisms

Alternatives considered

PostProcessor

InlineContentParser

robinst commented Jul 7, 2024

codecov bot commented Sep 6, 2024 • edited Loading

Codecov Report

robinst commented Jul 7, 2024 •

edited

Loading

codecov bot commented Sep 6, 2024 •

edited

Loading