Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context/State? #164

Open
binarycow opened this issue Sep 2, 2024 · 5 comments
Open

Context/State? #164

binarycow opened this issue Sep 2, 2024 · 5 comments

Comments

@binarycow
Copy link
Contributor

FParsec (an F# parser combinator library) has the concept of "user state", which can be used to maintain state between parsers.

Is there any such concept in Superpower? If not, it makes it harder to do grammars that aren't context-free.

@binarycow
Copy link
Contributor Author

In my specific case, I have a language which has "statements" - all of which follow the same structure: kind, argument, then children.

But which children (and how many of those children) are allowed depends on the kind of statement. For example, not all statements allow a description statement as a child - and the ones that do, only one description statement is allowed.

It would be nice if the parsing process was able to keep track of how many statements of that type it has encountered in this scope, and produce an "unexpected" error if it encounters an extra one.

@binarycow
Copy link
Contributor Author

If this is not currently possible, would you be opposed if it was added? (I'd be willing to do most of the work)

In general, it would consist of:

  1. Adding "result" types (Result<TUserState, TResult> and TokenListParserResult<TUserState, TResult>)
  2. Add another delegate types (TextParser<TUserState, TResult>, TokenListParser<TUserState, TKind, T>) each which accept a user state in addition to their input
  3. Adding equivalents to the other methods for those new delegate types.
public struct Result<TUserState, TResult> { /* Implementation ommitted */ }
public delegate Result<TUserState, TResult> TextParser<TUserState, TResult>(TextSpan input, TUserState userState);

In an ideal world, the non-user state methods would simply redirect to the user state methods. For example (assuming the necessary methods exist):

static TextParser<char> Matching(Func<char, bool> predicate, string[] expectations) 
    => Matching<Unit>(predicate, expectations).WithoutState();

static TextParser<char> Matching(Func<char, bool> predicate, string[] expectations)
{
    if (predicate == null) throw new ArgumentNullException(nameof(predicate));
    if (expectations == null) throw new ArgumentNullException(nameof(expectations));

    return input =>
    {
        var next = input.ConsumeChar();
        if (!next.HasValue || !predicate(next.Value))
            return Result.Empty<char>(input, expectations);

        return next;
    };
}

But, if performance reasons dictate that doesn't happen (e.g., extra closures), then a copy/paste with the necessary modifications would also work, albeit with more code.

@nblumhardt
Copy link
Member

Hi! Currently there are a few ways to implement this, I'm not sure they're quite as good as a built-in feature, but I don't think the gap is enough to warrant broad modifications at this point so hoping this helps:

  1. Parser objects

If you declare your parsers as instance fields or properties of a class (instead of statics), you can use other fields in the class to hold state - in plain variables, or stacks, if needed. A bit disruptive in how the parser is implemented, and needs some additional per-parse allocation, but pretty flexible.

A variation on this is to use AsyncLocal to hold state, and push/pop in the parsers that produce/consume it.

  1. Closure state

Linq expressions (nested closures) make it fairly easy to carry state from one parser to another, later one:

... MyParser { get; } =
     from a in ParserWithStateInReturn
     // ... more parsers in sequence
     from z in ParserThatNeedsState(a.State)
     // ...
  1. Post-parse semantic analysis

This is probably the nicest option in the long term, if your grammar is not truly context-sensitive: just allow the parse, and then validate the resulting AST before moving on to further processing. Might take some work to produce good error messages (things that may fail validation will need to be annotated with positions/line numbers).

When I say "context-sensitive" here I mean whether the syntax structure actually changes depending on context. If your parser needs to handle one language within another, or something of similar ferocious complexity, you can't usually get around that without a context-dependent parser. But, if the grammatical structure is still essentially the same in the child parsers you mention, regardless of their parent, then semantic analysis is the best thing to reach for.

HTH!

@binarycow
Copy link
Contributor Author

A variation on this is to use AsyncLocal to hold state, and push/pop in the parsers that produce/consume it.

Ah! That's an interesting idea. Not sure how I feel about this though... not to mention any implications that may arise from it.

If you declare your parsers as instance fields or properties of a class
Closure state

Then I'd essentially end up allocating 60-100 objects for parsing a single file, because each of my statement parsers would be an allocation (there's ~60), plus any ancillary parsers that I make to assist. Which can be avoided by just passing a state value throughout. Yes, I'd have one allocation (the state value), and even then, that could possibly be avoided if it's a struct.

When I say "context-sensitive" here I mean whether the syntax structure actually changes depending on context. If your parser needs to handle one language within another, or something of similar ferocious complexity, you can't usually get around that without a context-dependent parser.

Both of my use cases involve a language within a language. Yes, I can work around it, but it's annoying, knowing that a simple state value passed through would solve the problem.

IMO, without a state capability, post-parse semantic analysis is the only option. And I've got enough semantic analysis I gotta deal with, I was hoping to have less passes, not more.

@nblumhardt
Copy link
Member

A variation on this is to use AsyncLocal to hold state, and push/pop in the parsers that produce/consume it.

Ah! That's an interesting idea. Not sure how I feel about this though... not to mention any implications that may arise from it.

Seems like the one to explore, though - no added overhead for existing parsers, and no code changes required, mean it'll be a lot quicker to pull together.

Built-in user parsing state is a useful feature, but with other options within reach I don't think it'd motivate a change this broad at this stage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants