Variables in a local monadic binder escape to outer scopes #681

byorgey · 2022-09-12T03:58:32Z

Figured out what's causing the bug. Basically the b <- ... in the definition of until is overriding the definition of b in hanoi, which absolutely should not happen.

Originally posted by @byorgey in #669 (comment)

To reproduce, put the following in a .sw file and --run it in Creative mode:

def f = a <- scan down end;
let a = 2 in f; return (a+1)

The binding of the variable a from the definition of f is escaping into the outer scope from which f is called, so that a is now bound to inl () instead of 2, which obviously causes a runtime crash when the + operator is applied to a non-number.

Edited to add: we should keep the below discussion around for reference, but see #1087 for my latest thinking on the issue.

The text was updated successfully, but these errors were encountered:

byorgey · 2022-09-12T04:12:33Z

I was trying to do a git bisect but it seems like this bug has been around for a long, long time. For example it seems to be present already in 008d400 , which is the commit that introduced --run (it gets more annoying to test before that).

xsebek · 2022-09-12T05:35:54Z

Yikes, good job figuring this one out @byorgey. 👍

Maybe we could make the example a unit test, to make it easier to debug/fix? 🙂

I don’t know how the binder machinery works, so this is just a guess on my part, but could we fix this by renaming variables internally?

For example the inner variable could be named f::a[0] and the outer one would be _::a[0]. 🤔

byorgey · 2022-09-12T10:02:11Z

Maybe we could make the example a unit test, to make it easier to debug/fix? slightly_smiling_face

Yes, good idea.

I don’t know how the binder machinery works, so this is just a guess on my part, but could we fix this by renaming variables internally?

I mean, that's a reasonable idea, but I think it would just be a band-aid, and would probably lead to other bugs. In theory, we should be able to fix it simply by managing environments in such a way that the dynamic semantics (i.e. which values flow where) matches the static semantics (the scopes etc. enforced by the type checker).

xsebek · 2022-09-14T20:33:48Z

The worst thing about this is that it has horrible error messages:

def repeat = \c. c; repeat c end;
repeat ( c <- scan down; case c (\_. say "Hi") (\_. return ()) )

byorgey · 2022-09-14T21:09:52Z

It's actually a pretty good error, in the sense that it really is a fatal error and a bug in swarm, and the machine state at least gives us a bit of info about what's going wrong.

byorgey · 2022-09-16T02:58:52Z

Current progress: so when a def is executed, any environment generated by evaluating the definition body definitely is not added to the current environment. However, I think the problem is that evaluating the definition body actually doesn't really do anything: it simply allocates a delayed expression in the store. The delayed value in the store is actually evaluated/executed the first time the definition is referenced. I guess the problem is that this ends up acting too much as if we had just spliced the definition body into the location of the first reference to the defined name, which doesn't respect scoping of bound names.

Evaluating a delayed expression should never cause names to get bound, however, so maybe we can just change the way that happens?

Tried wrapping evaluation of delayed things in a new FDiscardEnv frame which causes any generated environment to be discarded, but that wasn't in the right place: evaluating the body of a definition (if the body is a bind expression) won't actually generate the environment, executing it will.

I guess the real issue is that whenever we're executing something and we look up the value of a variable, we should discard any environment generated by whatever the variable resolves to, because it came from some nested or non-local scope. Even that is tricky though... we have to know when such a value looked up as the value of a variable meets an FExec frame, which might be at some later time. Maybe the solution is to make a new kind of value, VNested :: Value -> Value; when this value meets an FExec frame, it generates a FDiscardEnv frame so we discard whatever environment is generated from executing the wrapped value. I wouldn't want to wrap every single variable lookup inside VNested; we only need to do it for variables with a cmd type. So maybe we can wrap def and let bodies for things of cmd type at typechecking/elaboration time. However, one small problem is that currently the elaborator doesn't have access to the types of terms. Hmm, and also maybe we need to also wrap commands that end up getting passed as arguments to a function. Simply wrapping all variables of type cmd seems much more robust --- it will still work even if we end up later with additional binding forms for variables (e.g. pattern matching).

byorgey · 2022-10-08T15:03:36Z

Wrapping all variables, period, is problematic, because then we have a VNested wrapper in a bunch of places where we need to ignore it but there's no nice, uniform way to do that. I am increasingly coming to the conclusion that we need to carry more type information through somehow so that we can make sure to only wrap things of type cmd a --- not just as an optimization, but as an issue of correctness. At a minimum, when we look up a variable in an environment at runtime, we need to know whether it has a cmd type or not.

xsebek · 2023-01-21T17:17:01Z

Just wanted to share that the error messages have improved:

def repeat = \c. c; repeat c end;
repeat ( c <- scan down; case c (\_. say "Hi") (\_. return ()) )

def f = a <- scan down end;
let a = 2 in f; return (a+1)

Thanks, @byorgey! 👍

byorgey · 2023-01-21T18:13:20Z

Optimistic we can actually get this fixed soon after we clean up and merge #991 , which I hope to get to soon. This past week was the first week of classes so not much extra time for me, but things should soon settle into more normal patterns...

- Closes #990 Values of type `Syntax` are as before: parsed syntax, with each node annotated with `SrcLoc`. Values of type `Syntax' Polytype`, however, have each node annotated with *both* a `SrcLoc` *and* a `Polytype`. (`Syntax` is really just a synonym for `Syntax' ()`.) Type inference takes a `Syntax` and outputs a `TModule`, which now contains a `Syntax' Polytype`, in other words, a new version of the AST where every node has been annotated with the inferred type of the subterm rooted there. --- Why is this useful? 1. It will enable us to do type-specific elaboration/rewriting. For example I think this will allow us to solve #681 , because we only want to apply a rewrite to variables with a command type. 2. It makes type information for any specific subterm easily available. For example I hope we will be able to use this to enhance the `OnHover` LSP handler, e.g. to show the type of the term under the mouse. I imagine the code changes might look kind of intimidating but I don't think it's really that bad once you understand what is going on, so I'm happy to answer any questions or explain anything.

i.e. #681. Doesn't work yet; fixes one test case but still fails others, and strangely seems to cause an unrelated test case to fail...

byorgey · 2023-01-25T12:38:44Z

This bug is so interesting and so irritating. I am learning a lot. My latest realization (thanks to #1033) is that it's not enough to just wrap variables with a cmd type, because that does not take into account functions which return a cmd type. My whole "wrapping variables" strategy seems less viable now.

I am beginning to think the real problem is that we treat names defined via a bind the same as those created with def, i.e. as globally exported names (see 609e9b0 which was part of #303 ). Really, we only want that for top-level binds written at the REPL. We don't have this issue with def because it can only occur at the top level in the first place. I wonder if there is some way we can just make top-level binds special case, perhaps by automatically rewriting them to include a def to export their name. That is, at the top level, we could rewrite x <- c1; c2 to x <- c1; def x = x end; c2. Or, since def introduces extra machinery in terms of delay, maybe we could just add a special export construct and rewrite x <- c1; c2 to something like x <- c1; export x; c2.

xsebek · 2023-01-25T21:07:20Z

@byorgey the export strategy sounds promising. 🙂

I am interested what your super secret final solution will be. 😄

byorgey · 2023-01-25T21:21:24Z

The export strategy is a promising way to solve the wrong problem. 😄

byorgey mentioned this issue Sep 12, 2022

Remove test challenges and add some more interesting ones #669

Merged

byorgey added a commit that referenced this issue Sep 16, 2022

add test case for #681

26eab16

byorgey mentioned this issue Oct 8, 2022

Fatal error: Bad machine state #736

Closed

byorgey self-assigned this Oct 8, 2022

byorgey mentioned this issue Jan 10, 2023

Annotate ASTs with types at every node #991

Merged

xsebek mentioned this issue Jan 23, 2023

Values referenced after a recursive call get overwritten #1032

Closed

byorgey mentioned this issue Jan 24, 2023

recursion bug demo #1033

Closed

byorgey added a commit that referenced this issue Jan 25, 2023

WIP: try fixing issue with monadic binders escaping their scope

123fc7f

i.e. #681. Doesn't work yet; fixes one test case but still fails others, and strangely seems to cause an unrelated test case to fail...

byorgey mentioned this issue Jan 25, 2023

Release 0.3 #1012

Closed

2 tasks

byorgey mentioned this issue Feb 5, 2023

suspend, i.e. the Secret Master Plan for Fixing All The Things #1087

Closed

byorgey added C-Project A larger project, more suitable for experienced contributors. and removed C-Moderate Effort Should take a moderate amount of time to address. labels Nov 15, 2023

kabuhr mentioned this issue Apr 4, 2024

Fatal error: Bad machine state in stepRobot: FApp of non-function #1796

Closed

byorgey mentioned this issue Jun 14, 2024

Refactor CESK machine to track environments through new Suspended state #1928

Merged

byorgey added a commit that referenced this issue Jun 14, 2024

tests for #681 and #1032

e8bc9be

byorgey added a commit that referenced this issue Jun 16, 2024

tests for #681 and #1032

51ac8c1

byorgey added a commit that referenced this issue Jun 16, 2024

tests for #681 and #1032

e3c61ca

byorgey added a commit that referenced this issue Jun 16, 2024

tests for #681 and #1032

74dd5fc

mergify bot closed this as completed in #1928 Jun 19, 2024

mergify bot closed this as completed in 23b5398 Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variables in a local monadic binder escape to outer scopes #681

Variables in a local monadic binder escape to outer scopes #681

byorgey commented Sep 12, 2022 •

edited

Loading

byorgey commented Sep 12, 2022

xsebek commented Sep 12, 2022

byorgey commented Sep 12, 2022

xsebek commented Sep 14, 2022

byorgey commented Sep 14, 2022

byorgey commented Sep 16, 2022 •

edited

Loading

byorgey commented Oct 8, 2022 •

edited

Loading

xsebek commented Jan 21, 2023

byorgey commented Jan 21, 2023

byorgey commented Jan 25, 2023 •

edited

Loading

xsebek commented Jan 25, 2023

byorgey commented Jan 25, 2023

Variables in a local monadic binder escape to outer scopes #681

Variables in a local monadic binder escape to outer scopes #681

Comments

byorgey commented Sep 12, 2022 • edited Loading

byorgey commented Sep 12, 2022

xsebek commented Sep 12, 2022

byorgey commented Sep 12, 2022

xsebek commented Sep 14, 2022

byorgey commented Sep 14, 2022

byorgey commented Sep 16, 2022 • edited Loading

byorgey commented Oct 8, 2022 • edited Loading

xsebek commented Jan 21, 2023

byorgey commented Jan 21, 2023

byorgey commented Jan 25, 2023 • edited Loading

xsebek commented Jan 25, 2023

byorgey commented Jan 25, 2023

byorgey commented Sep 12, 2022 •

edited

Loading

byorgey commented Sep 16, 2022 •

edited

Loading

byorgey commented Oct 8, 2022 •

edited

Loading

byorgey commented Jan 25, 2023 •

edited

Loading