Parser simplification and Closure Shortcut features #7212

dadhi · 2024-11-10T22:13:48Z

Simplify Parser and add closure shortcut features

Initial intent

I wanted to add a fun feature, a syntax sugar for the closure of single argument with pipe,
where this code \|> foo translates to \x -> x |> foo.

To identify the place where I can add the feature, I have started from the parser.
My debugging experience with Roc parser was miserable. TL;DR; too many nested calls, closures, and macros.
So I've started to unwind the Parser from the original closure_help function and ended up with this PR :-)

Goals

Done

Further

Replace and remove all combinators and their tracing code. Seems reallistic and a small task to do.
'roc format` option to expand closure shortcuts

Random

Think how to optimize the Parser further. Do the low-hanging things. Mark potential places with todo: @perf
Optimize the tests running speed, currently cargo test --release takes 5 min on my laptop, or e.g. time cargo test -p roc_load --test test_reporting takes 20s. Whaterver it does, it is a lot.

PR does not

It is not a serious optimization of the parser.

I was refactoring and moving existing parts around, removing what become redundant in the process.

Mmm, OK. I did a small optimization/non-pessimization (TM) using the opportunities emerging from the inlining and replacing the combinators with the direct function calls. For example, parse_term checks the first character before calling the appropriate parsing function for the rest of the input.

PR does

Features. See in the corresponding section below. Features implementation is localized in the Parser and for the small bits in the Formatter. This simplifies removing or splitting them from this PR.
Simplifies Parser by removing closures, macros, deep call chains, unused code, wrapping and unwrapping the results between abstraction layers. For me personally, it simplified the debugging and code understanding.
Speeds up compiler compilation because of simpler code and less LOC. I am inlining things and less LOC seems counterintuitive, right?
After inlining the leaky abstractions are gone!
After inlining, I have found that some errors states were never produced, so the handling of them is unreachable and may be safely removed, e.g. EClosure::Comma, EWhen::Bar, EWhen::IfToken, and many more.
Removes unnecessary and double checks, a prominent one being mutiple checks if identifier is not a keyword.
Opens room for the further optimizations, because you colocate things and now see the whole picture (locality of behavior ftw).
After inlining, the names were adapted from the abstract and general to more concrete and contextual.
In its current state, the Parser style is not consistent - in one place it is declarative/combinatorial in others it is imperative or mixed (where combinators become complex). This PR removes the most of combinators (and the cost of them), moving toward the pure imperative style, using just functions and calls with minimum macro use.
After looking at the inlined invocation I have found repeated small chunks of infra code, so I wrapped them in the sugar functions. For example, I have added spaced_before, spaced_around abstracting the repetition of the same code in many places.

Parser Benchmarks

cd ./crates/compiler/parse && cargo bench

main branch:

PR branch:

New debug experience

Comparing the call-stacks when debugging the test from the test_syntax:

    #[test]
    fn single_line_string_literal_in_pattern() {
        expr_formats_same(indoc!(
            r#"
            when foo is
                "abc" -> ""
            "#
        ));
    }

Breakpoint is on the construction of the AST node for the WhenBranch.

In the main branch (barely fitting on my laptop screen with the smallish font):

In this PR branch:

Features

Closure shortcut for the binary and unary operators, record field and tuple accessors, identity function

It is working for all binary and unary operators, including the Pizza pipe operator |> and the new ~ operator for the pattern matching, the record field, tuple accessors and the identity function.
Plus, those shortcuts are naturally combining, e.g. unary negate with identity and access, then further with the binary operator chain.

Tests for illustration:

    #[test]
    fn func_call_trailing_lambda_pizza_shortcut() {
        expr_formats_same(indoc!(
            r"
                list = List.map [1, 2, 3] \|> Num.add 1
                list
            "
        ));
    }

    #[test]
    fn func_call_trailing_lambda_plus_shortcut() {
        expr_formats_same(indoc!(
            r"
                list = List.map [1, 2, 3] \+ 1
                list
            "
        ));
    }
    
    #[test]
    fn closure_shortcut_for_when_binop() {
        expr_formats_same(indoc!(
            r#"
            \~
                "abc" -> ""
                _ -> "abc"
            "#
        ));
    }

    #[test]
    fn closure_shortcut_for_tuple_index_access() {
        expr_formats_same(indoc!(
            r"
            \.0
            "
        ));
        expr_formats_same(indoc!(
            r"
            \-.0
            "
        ));
        expr_formats_same(indoc!(
            r"
            \!.0
            "
        ));
    }

    
    #[test]
    fn closure_shortcut_for_field_tuple_index_access() {
        expr_formats_same(indoc!(
            r"
            \.bru.0
            "
        ));
    }

    #[test]
    fn closure_shortcut_for_field_as_part_of_bin_op() {
        expr_formats_same(indoc!(
            r"
            \.foo.bar + 1
            "
        ));
        expr_formats_same(indoc!(
            r"
            \-.foo.bar + 1
            "
        ));
    }

    #[test]
    fn closure_shortcut_for_identity_function() {
        expr_formats_same(indoc!(
            r"
            \.
            "
        ));
    }
    
    #[test]
    fn closure_shortcut_unary_access_with_binop_chain() {
        expr_formats_same(indoc!(
            r#"
            \-.foo.bar + 5 ~
                42 -> ""
                _ -> "42"
            "#
        ));
    }

Current implementation changes the Parser to create the same AST for shortcut \|> bar as for the normal \x -> x |> bar.
I am still saving the information that the closure is in fact a shortcut via the additional field in Expr::Closure.
When the flag is set, the Roc format is adjusted to output the AST in the short form

User-guided shortcut expansion

By inserting the whitespace after \ in the closure shortcut, the User indicates the expansion into the full form on the roc format:

Say you've typed \|> shortcut, then you can put space between opening \ and |> in \ |> foo and it will expand into \x -> x |> foo on format (automatically with format-on-save and auto-save enabled in the editor).

First, it provides a uniform way to understand the shortcut meaning, you may see for yourself by typing space, then removing space to go back to the shortcut form or keep the expanded form.
Second, the feature may work as a kind of 'editor shortcut' feature. Shortcuts are expanded into the full form while you're typing. Interesting part, that it is supported by the language itself without need for special editor plugins or tooling. IMHO, it matches the Roc philosophy about tooling in a language, and personally it is just a fun feature I did not see anywhere else.

Example:

    #[test]
    fn closure_shortcut_for_tuple_index_access_fmt_expand() {
        expr_formats_to(
            indoc!(
                r"
            \ .0
            "
            ),
            indoc!(
                r"
            \x -> x.0
            "
            ),
        );
        expr_formats_to(
            indoc!(
                r"
            \ -.0
            "
            ),
            indoc!(
                r"
            \x -> -x.0
            "
            ),
        );
        expr_formats_to(
            indoc!(
                r"
            \ !.0
            "
            ),
            indoc!(
                r"
            \x -> !x.0
            "
            ),
        );
    }

   #[test]
    fn closure_shortcut_for_identity_function_format() {
        expr_formats_to(
            indoc!(
                r"
            \ .
            "
            ),
            indoc!(
                r"
            \x -> x
            "
            ),
        );
    }

Binary operator for the when pattern matching

Implemented as ~ symbol, which is not yet used by the other features.
The difference from the |> is that it should be the last in the chain of the binary operators.

    #[test]
    fn basic_pattern_binop() {
        expr_formats_same(indoc!(
            r#"
            foo ~
                "abc" -> ""
                _ -> "abc"
            "#
        ));
    }

    #[test]
    fn pattern_binop_with_operator_chain_equivalents() {
        expr_formats_same(indoc!(
            r#"
            a + b + c ~
                42 -> ""
                _ -> "42"
            "#
        ));
        expr_formats_same(indoc!(
            r#"
            (a + b + c) ~
                42 -> ""
                _ -> "42"
            "#
        ));
    }

This reverts commit ea23576.

… for underscore ident

…re shortcut

…ng on format

…t! suffix

skyqrose · 2024-11-10T23:33:14Z

This has a lot of pretty impactful changes to the language that haven't been through any discussion or feedback from the community. It looks like some fun stuff to experiment with, but you shouldn't expect these sorts of changes to be merged into the language without getting general consensus on them first.

lukewilliamboswell · 2024-11-12T21:15:15Z

I think there are some promising ideas and improvements here, however as @skyqrose has mentioned these changes have not been discussed with the community and they will have a significant impact on the language.

These kinds of changes to the core language are typically discussed in great length under the guidance of @rtfeldman.

I recommend you checkout our (informal) process for managing PRs here. As @bhansconnect mentioned in zulip, you should consider splitting this up into separate proposals. I can imagine some of the improvements and simplifications would be most appreciated.

I am closing this PR as it will not be accepted in this form. Again, thank you for sharing this and I hope we can land some of these improvements.

dadhi · 2024-11-13T16:47:00Z

@lukewilliamboswell will you be interested if I split the PR into performance improvements and simplification of the parser, and keep the features aside?

dadhi added 30 commits June 10, 2024 17:05

Merge branch 'roc-lang:main' into main

33be4d1

Merge branch 'roc-lang:main' into main

3a4316d

Merge branch 'roc-lang:main' into main

0d680dc

Revert "feat: better error msg"

f6b30a1

This reverts commit ea23576.

adding the test to pass

a780642

simplify-inline closure_help

855267d

simplify-inline+remove byte_indent

3b1e257

Merge branch 'roc-lang:main' into main

5a6de63

Merge branch 'main' into one_arg_closure_pipe_2

7e18237

simplify-inline and for params and body in closure help

844a269

simplify-inline,remove sep_by1_e

863ad6d

simplify-remove unused Err processing from closure_help

9640a3d

simplify-inline,detect and remove unused EClosure::Comma state

0b269dd

simplify-inline skip_first in closure_help

eecc717

simplify-inline two_bytes for the closure arrow parsing

e45c26f

simplify-inline space0_before_e for the body parser

8ca994d

simplify-inline 'and' for the body parser

8537b6e

simplify-inline space0_e for the body parser

f848ab1

simplify-inline specialize_err_ref for the body parser

434f87c

add Loc::pos to simplify closure_param

bc3656e

inline internals of loc(underscore_pattern_help())

c0d5162

simplify internals of loc_pattern_in_parens_help

c07db07

inline the parens of loc_pattern_in_parens_help

5f65e30

Merge branch 'roc-lang:main' into main

fbaf992

Merge branch 'main' into one_arg_closure_pipe_2

746700b

fix merge

525a21d

cleanup commented code

2804af6

Merge branch 'roc-lang:main' into main

17e15e7

Merge branch 'main' into one_arg_closure_pipe_2

7d81456

Inline one_of! in closure_param to simplify debugging

07e2eb8

dadhi added 23 commits October 30, 2024 21:53

closure ~ when shortcut is ready

8234ebe

Merge branch 'roc-lang:main' into main

e1a7455

Merge branch 'main' into funny_feat

f2cc92c

rename eat_space to eat_nc and remove a lot of parser

559d243

fix clippy warnings

3bea81a

optimize parse_ident and is_keyword lookup

cba844f

split Ident::Access to Ident::Plain and Access

5396175

2/3 case for the ~ when operator is covered; remove check for keyword…

bec6d78

… for underscore ident

Merge branch 'roc-lang:main' into main

6e2e34f

support chain of binops and access ops in the when operator and closu…

00fd0aa

…re shortcut

Merge branch 'main' into funny_feat - try desugaring

24d87ec

remove two_bytes and specialize_err_ref as ot used anymore

8e76213

added to tests to @fixme

527ca73

fixme expanding when closure shortcut

abd3fd8

generating unique name for the shortcut args + nice name when expandi…

bc64e1b

…ng on format

remove more stuff from parser including fn map

2e36143

simplify test names by removing simple_ prefix :-)

52e6893

Merge branch 'roc-lang:main' into main

5549eb4

simplify ident and remove unnecessary checks for kw, wip for the iden…

2f857bd

…t! suffix

merge main with purity inference and ident! names syntax

76d2208

simplify stuff around naming

e963cf5

support for the unaryop

7cef567

more tests

bd205c3

dadhi added 3 commits November 11, 2024 17:34

Merge branch 'roc-lang:main' into main

0f044e0

resolve conflicts

13fe6a1

streamlining parse_bin_op and parse_op

bc7229d

lukewilliamboswell closed this Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser simplification and Closure Shortcut features #7212

Parser simplification and Closure Shortcut features #7212

dadhi commented Nov 10, 2024 •

edited

Loading

skyqrose commented Nov 10, 2024

lukewilliamboswell commented Nov 12, 2024

dadhi commented Nov 13, 2024

Parser simplification and Closure Shortcut features #7212

Parser simplification and Closure Shortcut features #7212

Conversation

dadhi commented Nov 10, 2024 • edited Loading

Simplify Parser and add closure shortcut features

Initial intent

Goals

Done

Further

Random

PR does not

PR does

Parser Benchmarks

New debug experience

Features

Closure shortcut for the binary and unary operators, record field and tuple accessors, identity function

User-guided shortcut expansion

Binary operator for the when pattern matching

skyqrose commented Nov 10, 2024

lukewilliamboswell commented Nov 12, 2024

dadhi commented Nov 13, 2024

dadhi commented Nov 10, 2024 •

edited

Loading