Tail Recursive Stack Disciplines for an Interpreter: Arc Version
As I mentioned previously I have begun rewriting the interpreter core to address some of the deficiencies I had encountered in the previous architecture, most notably how the C interface fits in, as this seems like a rather fundamental deficiency. I am presently reworking the core interpreter to fit into the new architecture, and while I’m at it, I might as well begin making some optimisations that I should have tried to do.
I ran across a paper by Richard A. Kelsey: Tail-Recursive Stack Disciplines for an Interpreter, where he describes some techniques for managing continuations and such on the stack. Essentially, what previous versions of Arcueid have done is essentially store all environments and continuations on the heap, as Appel suggests. This seems like the most logical thing to do, because Arc’s parameters are substantially more complicated than Scheme’s or those of other Lisp dialects. There are optional parameters, rest parameters, and most complicated of all, destructuring binds. Arcueid’s compiler essentially copies all parameters off the stack and into the heap, performing destructuring binds and the evaluation of optional arguments as they are required, so that parameter access is as simple as possible. However, this is slow because every parameter is copied from the stack to the heap, and tail calls, such as they are, produce large quantities of garbage that might not be immediately collected.
What we can try to do, however, is use the stack itself to store environments, as the values were already put there by the caller to begin with. Of course, optional arguments, rest parameters, and destructuring binds still need to be handled, and there are a number of approaches we can take to handle this in our interpreter.
The env instruction that creates environments can instead be a three-parameter instruction that takes the number of flat arguments that should be present, the number of optional arguments after the last one, and the number of additional arguments that are to be bound to the destructuring binds. The instruction does the following things:
- The first parameter determines the minimum portion of the stack that becomes part of the environment, with destructuring bind arguments being counted as only one argument for these purposes. These are the minimum number of parameters that can be passed to the function.
- The second parameter determines how many destructuring bind parameters are added.
- The third parameter determines how many optional parameters can be expected from the stack.
The env instruction thus sets up the stack to hold an environment that includes all parameters that have been pushed on the stack, determines which optional parameters were unbound, and adds extra space for destructuring bind arguments. For example, for this rather complicated parameter list:
(fn (a (b c) d (o e d)) ... )
The env instruction would be
env 3 2 1, making an environment that is six elements in size. The environment entries would thus be:
|5||e (may be unbound)|
Additional instructions to destructure environment slot 1 and bind the results to envronment slots 3 and 4 then follow, as will an instruction to check if environment slot 5 is already bound to some value. If not, instructions to compute the default value are available.
Tenatively the above parameters should compile to something like:
env 3 2 1
; Destructuring bind instructions
lde 0 1
ste 0 3
lde 0 1
ste 0 4
; Optional arguments
lde 0 5
lde 0 2
The new instruction
jbnd (jump if bound) is used to jump if the value register is not the value CUNBOUND.
The envr instruction is used for parameters that look like this:
(fn (a (b c) d (o e d) . f) ... )
That would be
envr 3 1 2, making an environment that is seven elements in size.
|3||e (may be unbound)|
The rest parameters are bound up by the envr instruction so with the sole exception of envr replacing env as needed.
What happens when a closure is returned, like say for a higher-order function, or when call/cc creates a continuation? There are several implementation strategies that might work, and there are strategies such as spaghetti stacks as well as perhaps copying the whole stack, which might be more expensive if ccc is used a lot but perhaps not so much in actual practice. We’ll see just how all of this fits together.