Skip to content

RFC #0002 Lexical scopes, environments, stack frames and the call stack

Higor Eurípedes edited this page Oct 4, 2013 · 10 revisions

The content in this page is mostly outdated as of 2013/10/04. Most, if not all, proposed changes have been available on master for a couple months. See the Comments bellow for details.

This document is intended as a guide or reference for the possible future implementation of advanced scoping techniques like closures and continuations. The text here presented is not supposed to be regarded as an authoritative documentation on the subject.

Common languages, like the famous (and infamous) C and Java languages, implement a kind of scoping denominated "lexical scoping". This term is used because the tree-like structure of the scopes resemble the block-delimitated (on C-like languages) of the program's source code.

Clever, as an C-look-a-like, also implements lexical scoping. However, there are some functional programming language features that are very interesting from a user/programmer point of view, such as first-order functions, which require specific stack frame behavior. As of today, the Clever VM is unable to handle these use cases, because clever stack frames behave like C-style frames, that is, as soon as the execution leaves a scope, the variables and functions defined within the scope are lost.

// Non-working example
import std.*;

function lazy_double(x) {
	return function () {
		return x * 2;
	};
}

var a = lazy_double(2);
println(a()); // prints 4.
lazy_double(5);
println(a()); // this is supposed to print 4, but prints 10

The issue lies in some components of Clever's core: the symbol resolver and the virtual machine. Our current implementation of the resolver just creates a symbol table and does nothing to investigate the depth of a variable, i.e. where it has been defined. The virtual machine starts copying the function arguments and local variables as soon as it see a FCALL instruction and even after the code generation phase it still relies on scope information. All data is stored on "pools" that are indexed by either value_id (constants and temporaries) or by scope_id:value_id (variables, types, functions), and, from what I've seen, the only use of StackFrame structure is to store the return address and values.

To allow the use of the early said features, I propose the use of environments instead of regular stack frames. The purpose of this structure is to allow the existence of long-living function instances in the form of closures. The use of this kind of activation record requires frame-local addressing, instead of the current "value pool" method, to isolate the dynamic data created on the closure instantiation.

// Basic environment structure
struct Environment {
	Environment* outer; // pointer to the outer environment
	ValueVector data;   // vector of values (local vars and arguments)
}

Upon the declaration of a function in the symbol resolver, a related Environment would be created and attached to the function instance. To access external values the implementation could either emit instructions to search recursively for a value v at depth d or copy the external values to the data member.

The virtual machine would still have a call stack, but now filled with environments. Every time a FCALL is found, the VM would copy the fdata Environment, update the outer pointer to point to the top of the stack and push the newly allocated Environment to the stack. After the function execution the VM would either dispose the allocated data or leave it for the GC to take care of (in case the user requested a closure).

Function call algorithm:

  • Clone the function's environment
  • Set the function argument values
  • Push the environment to the call stack
  • Run
  • Pop the environment from the call stack
  • Dispose the allocated environment if possible

References

Comments

2013-10-04 heuripedes

I have mistakenly asserted that a function call requires environment cloning. Such operation should only be performed for closures, as their environments (generally) have an explicit dependency on the outer function environment. It should also be noted that upon a closure call all environments but the global environment must be cloned in order to ensure correct behavior of the closure.

One good optimization of this scheme is possible when variable tracking is available, because it allows us to know which environments are really needed.

So, considering this new input, the new algorithm reads as follows:

if f is a closure:
	clone f's environment and f's outer environments

fill f's arguments
push f onto the stack
run
pop f from the stack

if f is a closure:
	dispose f's environment and outer environments
Clone this wiki locally