Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser simplification and Closure Shortcut features #7212

Closed
wants to merge 252 commits into from

Conversation

dadhi
Copy link

@dadhi dadhi commented Nov 10, 2024

Simplify Parser and add closure shortcut features

Initial intent

I wanted to add a fun feature, a syntax sugar for the closure of single argument with pipe,
where this code \|> foo translates to \x -> x |> foo.

To identify the place where I can add the feature, I have started from the parser.
My debugging experience with Roc parser was miserable. TL;DR; too many nested calls, closures, and macros.
So I've started to unwind the Parser from the original closure_help function and ended up with this PR :-)

Goals

Done

  • Simplify Parser for performance, debug-ability and addition of the new features
  • Implement closure shortcut for all Binary operators
  • Implement reformat to full closure form guided by the space after slash \ -
  • Implement closure shortcut for the field and tuple access, e.g. \.foo, \.foo.bar or \.0.buzz. Note: consider how this functionality adds-up to the current RecordAccessor .foo { foo: 3 } and RecordUpdater &foo { foo: 1 } 42 functions.
  • Implement shortcut for identity function i -> i, via \.
  • Implement x ~ ... binary operator for the pattern matching when x is ...
  • Implement closure shortcut for the when pattern matching using the ~ operator via \~
  • Generate unique names for the parameter in closure shortcut, because Roc does not support shadowing yet
  • Implement shortcut for Unary Negate and Not operators in a form of \-. and \!. This syntax should be composable with the record/tuple field access, e.g. \!.foo.bar, \-.a + y.b, etc.
  • Fix all @fixme
  • Complete or convert all @wip

Further

  • Replace and remove all combinators and their tracing code. Seems reallistic and a small task to do.
  • 'roc format` option to expand closure shortcuts

Random

  • Think how to optimize the Parser further. Do the low-hanging things. Mark potential places with todo: @perf
  • Optimize the tests running speed, currently cargo test --release takes 5 min on my laptop, or e.g. time cargo test -p roc_load --test test_reporting takes 20s. Whaterver it does, it is a lot.

PR does not

It is not a serious optimization of the parser.

I was refactoring and moving existing parts around, removing what become redundant in the process.

Mmm, OK. I did a small optimization/non-pessimization (TM) using the opportunities emerging from the inlining and replacing the combinators with the direct function calls. For example, parse_term checks the first character before calling the appropriate parsing function for the rest of the input.

PR does

  • Features. See in the corresponding section below. Features implementation is localized in the Parser and for the small bits in the Formatter. This simplifies removing or splitting them from this PR.
  • Simplifies Parser by removing closures, macros, deep call chains, unused code, wrapping and unwrapping the results between abstraction layers. For me personally, it simplified the debugging and code understanding.
  • Speeds up compiler compilation because of simpler code and less LOC. I am inlining things and less LOC seems counterintuitive, right?
  • After inlining the leaky abstractions are gone!
  • After inlining, I have found that some errors states were never produced, so the handling of them is unreachable and may be safely removed, e.g. EClosure::Comma, EWhen::Bar, EWhen::IfToken, and many more.
  • Removes unnecessary and double checks, a prominent one being mutiple checks if identifier is not a keyword.
  • Opens room for the further optimizations, because you colocate things and now see the whole picture (locality of behavior ftw).
  • After inlining, the names were adapted from the abstract and general to more concrete and contextual.
  • In its current state, the Parser style is not consistent - in one place it is declarative/combinatorial in others it is imperative or mixed (where combinators become complex). This PR removes the most of combinators (and the cost of them), moving toward the pure imperative style, using just functions and calls with minimum macro use.
  • After looking at the inlined invocation I have found repeated small chunks of infra code, so I wrapped them in the sugar functions. For example, I have added spaced_before, spaced_around abstracting the repetition of the same code in many places.

Parser Benchmarks

cd ./crates/compiler/parse && cargo bench

main branch:

image

PR branch:

image

New debug experience

Comparing the call-stacks when debugging the test from the test_syntax:

    #[test]
    fn single_line_string_literal_in_pattern() {
        expr_formats_same(indoc!(
            r#"
            when foo is
                "abc" -> ""
            "#
        ));
    }

Breakpoint is on the construction of the AST node for the WhenBranch.

In the main branch (barely fitting on my laptop screen with the smallish font):

image

In this PR branch:

image

Features

Closure shortcut for the binary and unary operators, record field and tuple accessors, identity function

It is working for all binary and unary operators, including the Pizza pipe operator |> and the new ~ operator for the pattern matching, the record field, tuple accessors and the identity function.
Plus, those shortcuts are naturally combining, e.g. unary negate with identity and access, then further with the binary operator chain.

Tests for illustration:

    #[test]
    fn func_call_trailing_lambda_pizza_shortcut() {
        expr_formats_same(indoc!(
            r"
                list = List.map [1, 2, 3] \|> Num.add 1
                list
            "
        ));
    }

    #[test]
    fn func_call_trailing_lambda_plus_shortcut() {
        expr_formats_same(indoc!(
            r"
                list = List.map [1, 2, 3] \+ 1
                list
            "
        ));
    }
    
    #[test]
    fn closure_shortcut_for_when_binop() {
        expr_formats_same(indoc!(
            r#"
            \~
                "abc" -> ""
                _ -> "abc"
            "#
        ));
    }

    #[test]
    fn closure_shortcut_for_tuple_index_access() {
        expr_formats_same(indoc!(
            r"
            \.0
            "
        ));
        expr_formats_same(indoc!(
            r"
            \-.0
            "
        ));
        expr_formats_same(indoc!(
            r"
            \!.0
            "
        ));
    }

    
    #[test]
    fn closure_shortcut_for_field_tuple_index_access() {
        expr_formats_same(indoc!(
            r"
            \.bru.0
            "
        ));
    }

    #[test]
    fn closure_shortcut_for_field_as_part_of_bin_op() {
        expr_formats_same(indoc!(
            r"
            \.foo.bar + 1
            "
        ));
        expr_formats_same(indoc!(
            r"
            \-.foo.bar + 1
            "
        ));
    }

    #[test]
    fn closure_shortcut_for_identity_function() {
        expr_formats_same(indoc!(
            r"
            \.
            "
        ));
    }
    
    #[test]
    fn closure_shortcut_unary_access_with_binop_chain() {
        expr_formats_same(indoc!(
            r#"
            \-.foo.bar + 5 ~
                42 -> ""
                _ -> "42"
            "#
        ));
    }

Current implementation changes the Parser to create the same AST for shortcut \|> bar as for the normal \x -> x |> bar.
I am still saving the information that the closure is in fact a shortcut via the additional field in Expr::Closure.
When the flag is set, the Roc format is adjusted to output the AST in the short form

User-guided shortcut expansion

By inserting the whitespace after \ in the closure shortcut, the User indicates the expansion into the full form on the roc format:

Say you've typed \|> shortcut, then you can put space between opening \ and |> in \ |> foo and it will expand into \x -> x |> foo on format (automatically with format-on-save and auto-save enabled in the editor).

  • First, it provides a uniform way to understand the shortcut meaning, you may see for yourself by typing space, then removing space to go back to the shortcut form or keep the expanded form.
  • Second, the feature may work as a kind of 'editor shortcut' feature. Shortcuts are expanded into the full form while you're typing. Interesting part, that it is supported by the language itself without need for special editor plugins or tooling. IMHO, it matches the Roc philosophy about tooling in a language, and personally it is just a fun feature I did not see anywhere else.

Example:

    #[test]
    fn closure_shortcut_for_tuple_index_access_fmt_expand() {
        expr_formats_to(
            indoc!(
                r"
            \ .0
            "
            ),
            indoc!(
                r"
            \x -> x.0
            "
            ),
        );
        expr_formats_to(
            indoc!(
                r"
            \ -.0
            "
            ),
            indoc!(
                r"
            \x -> -x.0
            "
            ),
        );
        expr_formats_to(
            indoc!(
                r"
            \ !.0
            "
            ),
            indoc!(
                r"
            \x -> !x.0
            "
            ),
        );
    }

   #[test]
    fn closure_shortcut_for_identity_function_format() {
        expr_formats_to(
            indoc!(
                r"
            \ .
            "
            ),
            indoc!(
                r"
            \x -> x
            "
            ),
        );
    }

Binary operator for the when pattern matching

Implemented as ~ symbol, which is not yet used by the other features.
The difference from the |> is that it should be the last in the chain of the binary operators.

    #[test]
    fn basic_pattern_binop() {
        expr_formats_same(indoc!(
            r#"
            foo ~
                "abc" -> ""
                _ -> "abc"
            "#
        ));
    }

    #[test]
    fn pattern_binop_with_operator_chain_equivalents() {
        expr_formats_same(indoc!(
            r#"
            a + b + c ~
                42 -> ""
                _ -> "42"
            "#
        ));
        expr_formats_same(indoc!(
            r#"
            (a + b + c) ~
                42 -> ""
                _ -> "42"
            "#
        ));
    }

dadhi added 30 commits June 10, 2024 17:05
@skyqrose
Copy link
Collaborator

This has a lot of pretty impactful changes to the language that haven't been through any discussion or feedback from the community. It looks like some fun stuff to experiment with, but you shouldn't expect these sorts of changes to be merged into the language without getting general consensus on them first.

@lukewilliamboswell
Copy link
Collaborator

I think there are some promising ideas and improvements here, however as @skyqrose has mentioned these changes have not been discussed with the community and they will have a significant impact on the language.

These kinds of changes to the core language are typically discussed in great length under the guidance of @rtfeldman.

I recommend you checkout our (informal) process for managing PRs here. As @bhansconnect mentioned in zulip, you should consider splitting this up into separate proposals. I can imagine some of the improvements and simplifications would be most appreciated.

I am closing this PR as it will not be accepted in this form. Again, thank you for sharing this and I hope we can land some of these improvements.

@dadhi
Copy link
Author

dadhi commented Nov 13, 2024

@lukewilliamboswell will you be interested if I split the PR into performance improvements and simplification of the parser, and keep the features aside?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants