Support for extending inline parsing with custom inline content parsers #321

robinst · 2024-04-26T08:43:52Z

This adds an API for users/extensions to extend inline parsing (or override some built-in inline parsing), similar to the current support for custom block parsers. Fixes:

Add support for inline subparsers #263

Overview

Since 5790505 we've internally extracted some of the inline parsing logic. This PR turns that into an API. It's currently exported in the beta package because it might be subject to change (and Scanner, which this depends on are is in beta).

The entry point for this is Parser.Builder#customInlineContentParserFactory.

The registered factory needs to provide one or more "trigger characters". When such a character is encountered during inline parsing, the parser is called with a Scanner and it can parse inline content from there. It can decide not to return a result; in that case other parsers (including built-in ones) are tried next.

Example

See InlineContentParserTest for an example of how to use it. Registering the factory:

commonmark-java/commonmark/src/test/java/org/commonmark/parser/InlineContentParserTest.java

Line 20 in f481935

    
           var parser = Parser.builder().customInlineContentParserFactory(new DollarInlineParser.Factory()).build();

And the factory and parser:

commonmark-java/commonmark/src/test/java/org/commonmark/parser/InlineContentParserTest.java

Lines 56 to 86 in f481935

    
           private static class DollarInlineParser implements InlineContentParser { 
        
               private int index = 0; 
        
               @Override 
        
               public ParsedInline tryParse(InlineParserState inlineParserState) { 
        
                   var scanner = inlineParserState.scanner(); 
        
                   scanner.next(); 
        
                   var pos = scanner.position(); 
        
                   var end = scanner.find('$'); 
        
                   if (end == -1) { 
        
                       return ParsedInline.none(); 
        
                   } 
        
                   var content = scanner.getSource(pos, scanner.position()).getContent(); 
        
                   scanner.next(); 
        
                   return ParsedInline.of(new DollarInline(content, index++), scanner.position()); 
        
               } 
        
               static class Factory implements InlineContentParserFactory { 
        
                   @Override 
        
                   public Set<Character> getTriggerCharacters() { 
        
                       return Set.of('$'); 
        
                   } 
        
                   @Override 
        
                   public InlineContentParser create() { 
        
                       return new DollarInlineParser(); 
        
                   } 
        
               } 
        
           }

Design considerations

The trickiest part was deciding on the lifetime of the parser instance. There were a few options:

The parser takes an instance (instead of factory) which is used for all parsed documents. That would mean the instance has to be stateless, as it could be called concurrently from multiple parses. Care has been taken to make using a single parser from multiple threads safe, so this would go against that.
The parser takes a factory, and for each parsed document, one instance is created. While this solves the above problem, there is the following quote in the spec: "Note that the first step requires processing lines in sequence, but the second can be parallelized, since the inline parsing of one block element does not affect the inline parsing of any other." If we only had a single parser instance per document, we wouldn't be able to parallelize inline parsing in the future. There's also another problem: If an instance wants to keep state, it would probably be per inline snippet. In order to allow that, we would need to add some kind of "reset" method to signal it that a snippet has finished.
The parser takes a factory, and for each parsed inline snippet, an instance is created. This would allow parallelizing inline parsing, and is also a useful granularity if the parser wants to keep state. Note that e.g. link parsing is not yet using this mechanism, but with this, it could. (This is the option that was chosen.)
The parser takes a factory, and for each trigger character, an instance is created. This seems too small of a scope, and wouldn't allow the instance to keep any state across multiple parses.

Some other thoughts:

Why a Set<Character> for trigger characters instead of just a single char? If link parsing was implemented on top of this, it would require at least [ and ] as trigger characters, so let's allow multiple. There's not really a downside to allowing multiple.
Could a parser return more than a single Node? Not with the current API, but we could add that in a backwards-compatible way.
Why another Factory? Not sure what the alternative would be, and we already have the same concept for BlockParserFactory :).

robinst added 8 commits April 16, 2024 00:08

Add getTriggerCharacter to InlineContentParser, calculate inline parsers

d8fce11

Cleanups

1c0259d

Add customInlineContentParser and use in inline parsing

0dc0c2e

Add factory so that inline parsers can keep state

0773541

Allow to specify multiple trigger characters

e7d7bcd

Move inline content parser to beta API package

eeb0776

Rename Parser.Builder method to align with type

d876efe

Add CHANGELOG

f481935

robinst mentioned this pull request Apr 26, 2024

Allow to add inline parsers to built-in implementation #319

Closed

robinst added 2 commits April 26, 2024 19:02

Add some more Javadoc with links

0fd2427

README: Add section about customizing parsing

6b16c69

robinst merged commit 3733963 into main Apr 26, 2024
10 checks passed

robinst deleted the robinst-inline-content-parser branch April 26, 2024 12:33

robinst mentioned this pull request Apr 26, 2024

Add support for inline subparsers #263

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for extending inline parsing with custom inline content parsers #321

Support for extending inline parsing with custom inline content parsers #321

robinst commented Apr 26, 2024

	private static class DollarInlineParser implements InlineContentParser {

	private int index = 0;

	@Override
	public ParsedInline tryParse(InlineParserState inlineParserState) {
	var scanner = inlineParserState.scanner();
	scanner.next();
	var pos = scanner.position();

	var end = scanner.find('$');
	if (end == -1) {
	return ParsedInline.none();
	}
	var content = scanner.getSource(pos, scanner.position()).getContent();
	scanner.next();
	return ParsedInline.of(new DollarInline(content, index++), scanner.position());
	}

	static class Factory implements InlineContentParserFactory {
	@Override
	public Set<Character> getTriggerCharacters() {
	return Set.of('$');
	}

	@Override
	public InlineContentParser create() {
	return new DollarInlineParser();
	}
	}
	}

Support for extending inline parsing with custom inline content parsers #321

Support for extending inline parsing with custom inline content parsers #321

Conversation

robinst commented Apr 26, 2024

Overview

Example

Design considerations