Change `run` to `import` and typecheck it properly #495

byorgey · 2022-06-26T20:06:27Z

Elsewhere (e.g. #7, #208, #328) I have talked about how I don't like the run command and want to get rid of it. However, I'd like to explore an alternative: maybe I don't like it because it's not typechecked properly!

In particular, right now run : string -> cmd (), and typechecking it is utterly trivial: just make sure it is given a string as an argument. However, this leads to a few problems:

In Ability to include one file of definitions in another #208 , we see that typechecking (and in particular name resolution) does not take into account any definitions that are made in a file which is run. So if x.sw defines go : cmd (), then run("x.sw"); go would work perfectly fine at runtime, but currently does not typecheck, with an error about go being unbound.
In run prints value with bogus type #328 , the type of the value in the file is not taken into account, giving us bizarre things being printed out like true : ().
In fact, the way run works now can break type safety entirely: run breaks type safety #2068.

So here's an alternative we might contemplate:

Rename run to import and make it special built-in syntax (requiring an argument which is a string constant) rather than a command constant.
To typecheck import "foo", read foo.sw from disk and typecheck it (unless it has already been imported before, in which can we can just look it up in a map). The type of import "foo" is simply the type of the Swarm expression in foo.sw, and also binds the names of any definitions it contains. In other words it should behave exactly as if the contents of foo.sw had been typed in place of the import (modulo the fact that files can be imported more than once but only need to be checked once).
- Note this might require us to add IO to the type inference monad. Or actually, maybe not --- perhaps we could add a phase in between parsing and type checking where we gather up all the import statements and load and parse their referents into a Map, recursively processing any imports encountered; then we pass that map to the typechecker too. In any case it seems like it requires us to add IO to the processTerm function in one way or another.

This would still not be a true module system (in particular there's no way to do qualified imports, or hide names, etc.) but it would be way better than what we have, and it would at least solve #208 and #328 .

The text was updated successfully, but these errors were encountered:

xsebek · 2022-06-26T20:26:57Z

Thanks for writing this @byorgey, this sounds great and could be a huge improvement of S-Critical levels! 😄

For example, my earlier error with not included sources (#411) would be caught at type checking. 👍

byorgey · 2022-06-26T21:00:27Z

OK, in that case I've marked it S-Critical!

TristanCacqueray · 2022-06-26T22:17:55Z

Perhaps import could add the definitions to the environment using qualified names, for example, given a file my.sw:

// my.sw
def m2 = move; move end
def uturn = turn right; turn right end

Then running import "my.sw" would add definitions named my.m2 and my.uturn

We might also want to consider "dhall style" import. If I understand correctly how this would work, then any Term could be an import statement (e.g. a file or URL), and the import resolution system would substitute such terms with the content of their import locations. So for example, given this file:

// m4.sw
move; move; move; move

Then the following expressions are valid:

build {./m4.sw}
def m4 = ./m4.sw; end

And we could use the dot syntax for import that contains a list of definitions. For example, instead of using import "my.sw", we could define a new namespace:

>>> def my = ./my.sw end
>>> build {my.m2; ...}

Well if we can make this work, then it would be a nice solution for code packaging and distribution. Though I'm not really sure it's worth the effort, but I figured it may be useful to mention it here.

byorgey · 2022-06-27T02:02:10Z

@TristanCacqueray thanks for the idea! I think Swarm is actually well suited for this style since everything in the language is an expression. Even a file full of defs is really just one big expression that happens to be a chain of def commands joined with semicolons. It would be super cool to be able to e.g. directly import files from URLs --- e.g. if someone had some definitions on the wiki you could just import them directly, without having to download the file first.

In fact, now that I think about it, this is pretty much already how I was suggesting import should work --- the difference is really just one of syntax. We'll probably want to bikeshed the syntax (and make sure it's something we can parse unambiguously), but in general I'm very open to using a more lightweight syntax for imports, and supporting URLs as well.

The issue of namespacing is a separate issue, I think. Having names from imports automatically qualified seems too heavyweight --- I definitely don't want to have to type my.whatever for every single name. If it becomes important I think we can come up with something relatively simple to manage qualified names.

byorgey · 2022-09-16T14:06:38Z

An important caveat / thing that has to be properly addressed and designed for is that there needs to be an explicit difference between "re-fetch the most current version of this thing from the filesystem or network since it's been updated and I want the latest version" and "refer to a cached version of this thing which is named by an external URI but shouldn't be re-fetched every time it is used". This is more than just a difference in semantics; the first must have some sort of cmd type whereas the second does not have to. Perhaps for the first we just want some kind of load : string -> cmd () which fetches and caches, but does not run/execute, the named resource (and any that it recursively refers to). For convenience we could then define run t to do a load followed by executing the loaded thing.

xsebek · 2022-09-17T11:27:46Z

Is load necessary? IMO we should be able to resolve this when type-checking import.

For example, imagine running import "HTTP://some-swarm-file.sw" several times. We can store it in the cache and then only check if it (and transitive dependencies) changed. (Bonus points for checking ETag or Term difference.)

I think what we need is to:

map import "http://source" to Import (WebSource "...")
map WebSource to CachedSource (here we have to add hash/ETag to distinguish version)
- this is not necessary for local files, so we probably want something like data Source = LocalSource | CachedSource.
load Source to memory, type-check it and give it an internal name

Also Location may need to know about files.

EDIT: Actually we need to have the source loaded into memory to construct a well-typed Term, so caching, type-checking and creating an internal name must happen immediately.

byorgey · 2022-09-17T20:08:52Z

What you said about having data Source, etc. all sounds good. However, I don't think having a single import is satisfactory. Let me see if I can clearly explain my reasoning.

First of all, note that everything in Swarm is an expression, so there is no such thing as a "module" which is necessarily full of definitions. So if you reference http://blah/thing.sw it could have definitions... but it could also be e.g. the number 4, or a pair of two lambdas, or whatever.
Because of this, and because it's a cool and FP-ish idea, I like the suggestion above to make "Dhall-style" imports, i.e. a lightweight syntax for referring to remote terms, like e.g. def x : int = 3 + [http://blah.com/the_number.sw]. This has a couple implications:
- The type of an imported thing might not be a cmd type, so for the sake of referential transparency we want to just cache it the first time it is referenced and never reload it. For example we want let x = [http://foo] in x + x to always be the same as [http://foo] + [http://foo].
- It is not hard to imagine calling some remote function many times. If we have to actually check every time over the network to see whether it has been updated, that's going to seriously slow things down.
I think I'm not so keen on load any more, but maybe instead we should keep run : string -> cmd (), where the thing referenced by the URI must have type cmd (), and will be explicitly reloaded and executed.
- However, I'm not sure how to deal with transitive imports within the thing being reloaded. It seems like we can very easily get ourselves in a situation where we break referential transparency again.

This is actually much more theoretically complex and interesting than I thought. Basically because Swarm programs are going to be long-running, we need to support some kind of "hot reloading". Dhall does not have to deal with this because Dhall programs do not run for a long time; as I understand it has an "import resolution" phase at the beginning where it essentially replaces all import expressions with the current contents of the referenced file, and only then proceeds to evaluate everything using that snapshot of all the imported file contents. If you edit the files while Dhall is evaluating things it will have no effect; you would have to re-run the Dhall program from the beginning to see the changes.

xsebek · 2022-09-17T21:01:21Z

I am very confused. Your first branch of bullet points leads to caching files, but that is exactly what I was pondering previously. But in the second branch you argue for keeping run as it is now in some form.

I do not understand why it would be a good idea to hot reload instead of doing the loading once at the beginning like Dhall. Its approach sounds very reasonable to me and exactly like what this Issue originally proposed.

In particular I do not see what prevents us from resolving the source at the time it is type-checked — even if it is number 42 or “Hello world!”

If I want to import over network I can probably live with a less then a second delay of downloading a file in the background. The technical difficulty would be to put type checking in another thread and make sure it returns before we let the command be executed.

But that to me sounds much preferable to:

network calls during pure execution
escaping the game by changing the network file while some robot is off querying it

byorgey · 2022-09-17T21:30:33Z

I am very confused. Your first branch of bullet points leads to caching files, but that is exactly what I was pondering previously. But in the second branch you argue for keeping run as it is now in some form.

Sorry for the confusion --- I was trying to argue why we need both.

I do not understand why it would be a good idea to hot reload instead of doing the loading once at the beginning like Dhall. Its approach sounds very reasonable to me and exactly like what this Issue originally proposed.

In particular I do not see what prevents us from resolving the source at the time it is type-checked — even if it is number 42 or “Hello world!”

Hmm... maybe you are right. 🤔 The scenario I had in mind is when you have a file of definitions, mydefs.sw, so at the REPL prompt you type

> import "mydefs.sw"

(or whatever the syntax is), this causes the definitions to be loaded from the file, typechecked etc. But later you edit the file and then want to reload the new, changed definitions. So you type import "mydefs.sw" at the REPL prompt again. Do you get the newly edited file being loaded again? Or do you just get a cached version? I thought this was a tricky question, but now I think I see that it is not: because we are entering a brand new Swarm program (i.e. our REPL input), we should go (re)load the file while we are typechecking the input. On the other hand, suppose the commands in that file cause a robot to be built and go off running a program. That robot will forever use the program as it was defined at the time the robot was created, even if we later edit the file.

In summary, I was confused about the difference between robot programs running for a long time, and a game running for a long time during which you might enter many REPL inputs. As you say, I think the key idea is to resolve imports at typechecking time, just like Dhall does. That naturally implies that a long-running robot program will never see updates to remote files (because it is typechecked once and the runs for a long time), but entering stuff at the REPL is a natural opportunity to get the most recent version of something every time you reference it (because every time you enter something at the REPL, it is a new Swarm expression which has to be type-checked etc.).

So in the end, I agree we are back to what was originally proposed in this issue. But now I understand it much better. 😄

xsebek · 2022-09-18T07:59:19Z

@byorgey OK, I think we understood each other, hopefully I did not come across as too dismissive. 🙂

The thing leads me to argue for long running robots to not get extra power from run is that I have other solutions in mind. For example the web API to supplement REPL, antenna to communicate with robots and keyboard control to drive around. 😉

PS: Also immediately type checked import does not give extra power, so we do not need to come up with a capability and device for it! 😄

byorgey · 2022-09-18T19:33:08Z

@byorgey OK, I think we understood each other, hopefully I did not come across as too dismissive. 🙂

No, not at all! I'm really grateful to have someone who won't let me get away with nonsense. 😄

The thing leads me to argue for long running robots to not get extra power from run is that I have other solutions in mind. For example the web API to supplement REPL, antenna to communicate with robots and keyboard control to drive around. wink

Yes, I completely, 100% agree. Like I said, when I was talking about "hot swapping" what I actually had in mind was the ability to reload a previously loaded module by typing it at the REPL again (which is not really hot swapping at all), not hot swapping new code into a running robot.

PS: Also immediately type checked import does not give extra power, so we do not need to come up with a capability and device for it! 😄

Agreed, yay!

xsebek · 2022-11-06T18:06:52Z

@byorgey can we improve this in a minimal way and get 90% of the benefits?

I am thinking about initially restricting import to be nonrecursive or even one-level.

With that restriction, we could add a stage before type-checking where in processParsedTerm' for each import fp.sw we:

replace the import with an internal variable __fp (keeping the location)
load the file from the disk
infer the type of the file - T
add the type to the context - __fp: T
EDIT: add definitions from the imported file to the context as well

This would involve adding IO to the type of processParsedTerm', but I think that is necessary for the eventual full solution as well. I hope this could serve as the first stepping stone to iteratively improve import instead of doing it all at once as a big project.

What do you think @byorgey, is this a reasonable proposal or a dead end? 🙂

byorgey · 2022-11-06T20:15:52Z

I think it is a reasonable proposal but I do not think it is actually much easier than the full thing. It is like getting 90% of the benefits with 90% of the work. 😉 I am actually thinking of working on this next, and I don't think it will be hard.

xsebek · 2022-11-06T20:30:48Z

@byorgey great, I trust your judgement and I am happy to leave it to you. 👍

In my defence you tagged this as C-Project so I thought there would be a lot more work involved with caching the files and resolving collisions and whatnot. 😅

byorgey · 2022-11-06T20:38:00Z

I mean, there are probably lots of details to get right so it will be a decent amount of work, hence deserving C-Project. But I don't think it is conceptually difficult.

🤔 maybe we need different sets of tags to distinguish between how difficult something will be and how much work it will be...

byorgey · 2022-12-14T23:01:26Z

OK, I take it back, this is super annoying. 😠 Basically the problem is that processTerm and friends now require IO, as @xsebek points out above. But in several places we use processTerm as part of some FromJSON instances (in particular for Robot and Scenario), which definitely cannot involve IO. So we have to somehow stage it so we can first read JSON while leaving any embedded terms as Text, and then have a later resolution stage where we process them (loading any imports from disk/network).

It would be easy enough to just put Either Text ProcessedTerms in, have the FromJSON instances always return Left, and then have a function which goes through and changes everything to Right. But it would be easy to forget or miss some, etc. so I'd really like to do it in a typesafe way, so that the type of e.g. Scenario tells you whether it has raw Text terms or ProcessedTerm values. We already do something kind of similar for Robot, but so far it's been tedious and annoying getting everything to work. I think I can do it though.

UPDATE: It's even worse than I thought, the tendrils reach all the way into reading the palette in a world description, because it has a FromJSONE instance that requires a RobotMap, which is built from TRobot values, which contain ProcessedTerms...

byorgey · 2023-10-01T02:19:08Z

Loading and resolving imports at parse time is simply not going to work. I think the next thing I'm going to try is to parameterize Syntax not on the type of annotation, but on a phase, then use type functions to decide on various types depending on the phase. That way we can have a parsing phase which leaves imports unprocessed, then an import resolution phase, then typechecking, etc. and keep everything relatively type safe (e.g. we won't be able to accidentally start typechecking a term that hasn't had all its imports processed).

xsebek · 2023-10-01T08:21:27Z

@byorgey robots also need this change, because they are expected to have CESK already after parsing. I don't think that is an option as we don't have IO there.

So I think they need an extra Parsing phase before Template, so that the parsing is done once.

byorgey · 2023-10-01T13:25:24Z

@xsebek agreed! I think it is going to be a far-reaching change but I'm hoping parameterizing everything by phase will make it all work.

…actoring (#2074) A bit of initial refactoring towards #495 . - Replace `TSyntax` with `Syntax` in a couple places where the types are not needed. - Once we implement #495 producing `TSyntax` is going to get a lot harder, so the more places we can eliminate it the better. - Also some related minor refactoring, and a new function `readNonemptyTerm`. We're not using it yet, but (1) I thought I would need it in this PR, and (2) when I realized I didn't, I decided to leave it there anyway since it will likely come in handy later.

Now that `Failure` is moved to `swarm-util` (#2155), that was the last remaining dependency of `Swarm.Game.ResourceLoading` which prevented it from moving as well. The `ResourceLoading` code is not specific to Swarm scenarios; this PR moves it to `swarm-util` to make it more generally applicable. In fact, the `ResourceLoading` code is already being used to load `.sw` files when executing the `Run` command; in order to implement #495 we will need to do that loading in order to resolve `import` expressions, but that code will live in `swarm-lang`, which cannot import `swarm-scenario`.

byorgey mentioned this issue Jun 26, 2022

Ability to save definitions typed at the REPL to a .sw file #7

Open

byorgey added S-Critical This is an issue that seriously affects playability or user experience. and removed S-Moderate The fix or feature would substantially improve user experience. labels Jun 26, 2022

byorgey mentioned this issue Jun 28, 2022

Ability to include one file of definitions in another #208

Closed

xsebek mentioned this issue Jul 18, 2022

Design and implement more capabilities and devices #26

Closed

21 tasks

byorgey mentioned this issue Sep 4, 2022

run prints value with bogus type #328

Open

This was referenced Oct 3, 2022

Better support for mutual recursion #605

Open

Refactor move tutorial using split and whoami #724

Merged

xsebek mentioned this issue Nov 5, 2022

Run code from a .sw file as part of a scenario's goal condition #828

Closed

xsebek mentioned this issue Nov 7, 2022

Faster sheep check #844

Closed

byorgey self-assigned this Nov 22, 2022

byorgey mentioned this issue Dec 15, 2022

Interpret imports relative to the location of the file containing them rather than the current working directory #915

Open

byorgey mentioned this issue Feb 5, 2023

suspend, i.e. the Secret Master Plan for Fixing All The Things #1087

Closed

xsebek mentioned this issue Sep 30, 2023

robot wave scenario #1556

Merged

byorgey mentioned this issue Dec 14, 2023

dim sum restaurant #1686

Merged

byorgey mentioned this issue Jun 1, 2024

Recursive types #154

Closed

This was referenced Jun 11, 2024

Recording and playback #1926

Open

Refactor CESK machine to track environments through new Suspended state #1928

Merged

This was referenced Jul 21, 2024

run breaks type safety #2068

Open

Replace TSyntax with Syntax in a couple places, plus a bit of refactoring #2074

Merged

byorgey mentioned this issue Sep 30, 2024

Refactor: move ResourceLoading to swarm-util #2162

Merged

byorgey mentioned this issue Oct 7, 2024

Investigate terms being processed in tick #1560

Open

This was referenced Oct 15, 2024

Dictionary example #2161

Merged

Running a Swarm script from command line does not load definitions #2182

Open

xsebek mentioned this issue Oct 20, 2024

Add generic default #2192

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change `run` to `import` and typecheck it properly #495

Change `run` to `import` and typecheck it properly #495

byorgey commented Jun 26, 2022 •

edited

Loading

xsebek commented Jun 26, 2022 •

edited

Loading

byorgey commented Jun 26, 2022

TristanCacqueray commented Jun 26, 2022

byorgey commented Jun 27, 2022

byorgey commented Sep 16, 2022 •

edited

Loading

xsebek commented Sep 17, 2022 •

edited

Loading

byorgey commented Sep 17, 2022

xsebek commented Sep 17, 2022

byorgey commented Sep 17, 2022

xsebek commented Sep 18, 2022

byorgey commented Sep 18, 2022

xsebek commented Nov 6, 2022 •

edited

Loading

byorgey commented Nov 6, 2022

xsebek commented Nov 6, 2022

byorgey commented Nov 6, 2022

byorgey commented Dec 14, 2022 •

edited

Loading

byorgey commented Oct 1, 2023

xsebek commented Oct 1, 2023

byorgey commented Oct 1, 2023

Change run to import and typecheck it properly #495

Change run to import and typecheck it properly #495

Comments

byorgey commented Jun 26, 2022 • edited Loading

xsebek commented Jun 26, 2022 • edited Loading

byorgey commented Jun 26, 2022

TristanCacqueray commented Jun 26, 2022

byorgey commented Jun 27, 2022

byorgey commented Sep 16, 2022 • edited Loading

xsebek commented Sep 17, 2022 • edited Loading

byorgey commented Sep 17, 2022

xsebek commented Sep 17, 2022

byorgey commented Sep 17, 2022

xsebek commented Sep 18, 2022

byorgey commented Sep 18, 2022

xsebek commented Nov 6, 2022 • edited Loading

byorgey commented Nov 6, 2022

xsebek commented Nov 6, 2022

byorgey commented Nov 6, 2022

byorgey commented Dec 14, 2022 • edited Loading

byorgey commented Oct 1, 2023

xsebek commented Oct 1, 2023

byorgey commented Oct 1, 2023

Change `run` to `import` and typecheck it properly #495

Change `run` to `import` and typecheck it properly #495

byorgey commented Jun 26, 2022 •

edited

Loading

xsebek commented Jun 26, 2022 •

edited

Loading

byorgey commented Sep 16, 2022 •

edited

Loading

xsebek commented Sep 17, 2022 •

edited

Loading

xsebek commented Nov 6, 2022 •

edited

Loading

byorgey commented Dec 14, 2022 •

edited

Loading