Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variables in a local monadic binder escape to outer scopes #681

Closed
byorgey opened this issue Sep 12, 2022 · 12 comments · Fixed by #1928
Closed

Variables in a local monadic binder escape to outer scopes #681

byorgey opened this issue Sep 12, 2022 · 12 comments · Fixed by #1928
Assignees
Labels
Bug The observed behaviour is incorrect or unexpected. C-Project A larger project, more suitable for experienced contributors. L-Language design Issues relating to the overall design of the Swarm language. S-Critical This is an issue that seriously affects playability or user experience.

Comments

@byorgey
Copy link
Member

byorgey commented Sep 12, 2022

Figured out what's causing the bug. Basically the b <- ... in the definition of until is overriding the definition of b in hanoi, which absolutely should not happen.

Originally posted by @byorgey in #669 (comment)

To reproduce, put the following in a .sw file and --run it in Creative mode:

def f = a <- scan down end;
let a = 2 in f; return (a+1)

The binding of the variable a from the definition of f is escaping into the outer scope from which f is called, so that a is now bound to inl () instead of 2, which obviously causes a runtime crash when the + operator is applied to a non-number.

Edited to add: we should keep the below discussion around for reference, but see #1087 for my latest thinking on the issue.

@byorgey byorgey added Bug The observed behaviour is incorrect or unexpected. C-Moderate Effort Should take a moderate amount of time to address. S-Critical This is an issue that seriously affects playability or user experience. L-Language design Issues relating to the overall design of the Swarm language. labels Sep 12, 2022
@byorgey
Copy link
Member Author

byorgey commented Sep 12, 2022

I was trying to do a git bisect but it seems like this bug has been around for a long, long time. For example it seems to be present already in 008d400 , which is the commit that introduced --run (it gets more annoying to test before that).

@xsebek
Copy link
Member

xsebek commented Sep 12, 2022

Yikes, good job figuring this one out @byorgey. 👍

Maybe we could make the example a unit test, to make it easier to debug/fix? 🙂

I don’t know how the binder machinery works, so this is just a guess on my part, but could we fix this by renaming variables internally?

For example the inner variable could be named f::a[0] and the outer one would be _::a[0]. 🤔

@byorgey
Copy link
Member Author

byorgey commented Sep 12, 2022

Maybe we could make the example a unit test, to make it easier to debug/fix? slightly_smiling_face

Yes, good idea.

I don’t know how the binder machinery works, so this is just a guess on my part, but could we fix this by renaming variables internally?

I mean, that's a reasonable idea, but I think it would just be a band-aid, and would probably lead to other bugs. In theory, we should be able to fix it simply by managing environments in such a way that the dynamic semantics (i.e. which values flow where) matches the static semantics (the scopes etc. enforced by the type checker).

@xsebek
Copy link
Member

xsebek commented Sep 14, 2022

The worst thing about this is that it has horrible error messages:
image

def repeat = \c. c; repeat c end;
repeat ( c <- scan down; case c (\_. say "Hi") (\_. return ()) )

@byorgey
Copy link
Member Author

byorgey commented Sep 14, 2022

It's actually a pretty good error, in the sense that it really is a fatal error and a bug in swarm, and the machine state at least gives us a bit of info about what's going wrong.

@byorgey
Copy link
Member Author

byorgey commented Sep 16, 2022

Current progress: so when a def is executed, any environment generated by evaluating the definition body definitely is not added to the current environment. However, I think the problem is that evaluating the definition body actually doesn't really do anything: it simply allocates a delayed expression in the store. The delayed value in the store is actually evaluated/executed the first time the definition is referenced. I guess the problem is that this ends up acting too much as if we had just spliced the definition body into the location of the first reference to the defined name, which doesn't respect scoping of bound names.

Evaluating a delayed expression should never cause names to get bound, however, so maybe we can just change the way that happens?

Tried wrapping evaluation of delayed things in a new FDiscardEnv frame which causes any generated environment to be discarded, but that wasn't in the right place: evaluating the body of a definition (if the body is a bind expression) won't actually generate the environment, executing it will.

I guess the real issue is that whenever we're executing something and we look up the value of a variable, we should discard any environment generated by whatever the variable resolves to, because it came from some nested or non-local scope. Even that is tricky though... we have to know when such a value looked up as the value of a variable meets an FExec frame, which might be at some later time. Maybe the solution is to make a new kind of value, VNested :: Value -> Value; when this value meets an FExec frame, it generates a FDiscardEnv frame so we discard whatever environment is generated from executing the wrapped value. I wouldn't want to wrap every single variable lookup inside VNested; we only need to do it for variables with a cmd type. So maybe we can wrap def and let bodies for things of cmd type at typechecking/elaboration time. However, one small problem is that currently the elaborator doesn't have access to the types of terms. Hmm, and also maybe we need to also wrap commands that end up getting passed as arguments to a function. Simply wrapping all variables of type cmd seems much more robust --- it will still work even if we end up later with additional binding forms for variables (e.g. pattern matching).

byorgey added a commit that referenced this issue Sep 16, 2022
@byorgey byorgey self-assigned this Oct 8, 2022
@byorgey
Copy link
Member Author

byorgey commented Oct 8, 2022

Wrapping all variables, period, is problematic, because then we have a VNested wrapper in a bunch of places where we need to ignore it but there's no nice, uniform way to do that. I am increasingly coming to the conclusion that we need to carry more type information through somehow so that we can make sure to only wrap things of type cmd a --- not just as an optimization, but as an issue of correctness. At a minimum, when we look up a variable in an environment at runtime, we need to know whether it has a cmd type or not.

@xsebek
Copy link
Member

xsebek commented Jan 21, 2023

Just wanted to share that the error messages have improved:

def repeat = \c. c; repeat c end;
repeat ( c <- scan down; case c (\_. say "Hi") (\_. return ()) )

image

def f = a <- scan down end;
let a = 2 in f; return (a+1)

image

Thanks, @byorgey! 👍

@byorgey
Copy link
Member Author

byorgey commented Jan 21, 2023

Optimistic we can actually get this fixed soon after we clean up and merge #991 , which I hope to get to soon. This past week was the first week of classes so not much extra time for me, but things should soon settle into more normal patterns...

mergify bot pushed a commit that referenced this issue Jan 25, 2023
- Closes #990

Values of type `Syntax` are as before: parsed syntax, with each node annotated with `SrcLoc`.

Values of type `Syntax' Polytype`, however, have each node annotated with *both* a `SrcLoc` *and* a `Polytype`.  (`Syntax` is really just a synonym for `Syntax' ()`.)

Type inference takes a `Syntax` and outputs a `TModule`, which now contains a `Syntax' Polytype`, in other words, a new version of the AST where every node has been annotated with the inferred type of the subterm rooted there.

---

Why is this useful?
1. It will enable us to do type-specific elaboration/rewriting.  For example I think this will allow us to solve #681 , because we only want to apply a rewrite to variables with a command type.
2. It makes type information for any specific subterm easily available.  For example I hope we will be able to use this to enhance the `OnHover` LSP handler, e.g. to show the type of the term under the mouse.

I imagine the code changes might look kind of intimidating but I don't think it's really that bad once you understand what is going on, so I'm happy to answer any questions or explain anything.
byorgey added a commit that referenced this issue Jan 25, 2023
i.e. #681.  Doesn't work yet; fixes one test case but still fails
others, and strangely seems to cause an unrelated test case to fail...
@byorgey byorgey mentioned this issue Jan 25, 2023
2 tasks
@byorgey
Copy link
Member Author

byorgey commented Jan 25, 2023

This bug is so interesting and so irritating. I am learning a lot. My latest realization (thanks to #1033) is that it's not enough to just wrap variables with a cmd type, because that does not take into account functions which return a cmd type. My whole "wrapping variables" strategy seems less viable now.

I am beginning to think the real problem is that we treat names defined via a bind the same as those created with def, i.e. as globally exported names (see 609e9b0 which was part of #303 ). Really, we only want that for top-level binds written at the REPL. We don't have this issue with def because it can only occur at the top level in the first place. I wonder if there is some way we can just make top-level binds special case, perhaps by automatically rewriting them to include a def to export their name. That is, at the top level, we could rewrite x <- c1; c2 to x <- c1; def x = x end; c2. Or, since def introduces extra machinery in terms of delay, maybe we could just add a special export construct and rewrite x <- c1; c2 to something like x <- c1; export x; c2.

@xsebek
Copy link
Member

xsebek commented Jan 25, 2023

@byorgey the export strategy sounds promising. 🙂

I am interested what your super secret final solution will be. 😄

@byorgey
Copy link
Member Author

byorgey commented Jan 25, 2023

The export strategy is a promising way to solve the wrong problem. 😄

@byorgey byorgey added C-Project A larger project, more suitable for experienced contributors. and removed C-Moderate Effort Should take a moderate amount of time to address. labels Nov 15, 2023
byorgey added a commit that referenced this issue Jun 14, 2024
byorgey added a commit that referenced this issue Jun 16, 2024
byorgey added a commit that referenced this issue Jun 16, 2024
byorgey added a commit that referenced this issue Jun 16, 2024
@mergify mergify bot closed this as completed in #1928 Jun 19, 2024
@mergify mergify bot closed this as completed in 23b5398 Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug The observed behaviour is incorrect or unexpected. C-Project A larger project, more suitable for experienced contributors. L-Language design Issues relating to the overall design of the Swarm language. S-Critical This is an issue that seriously affects playability or user experience.
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants