Three Implementation Models for Scheme
Well, the main issue with Arcueid at the moment is that it is rather slow. Probably about an order of magnitude slower than reference Arc built on MzScheme, and a profiling analysis of the system shows that one of the biggest performance-sucking parts of the system the fact that it has to move the stack-based environments onto the heap every time it makes a closure, and that also means updating the pointers to these environments stored in continuations on the stack. This is the third most time-consuming operation as can be seen from the profiling run we have in last post (this is what the ‘nextcont’ and ‘contenv’ functions do). Well, it would seem that this is a problem that Scheme implementors have solved, and I found a dissertation by R. Kent Dybvig that discusses approaches to solving this issue that are known.
I have been reading through the Dybvig dissertation and have found an idea that may be able to solve the performance issues. The current version of Arcueid basically takes a great deal of care to ensure that when an environment used in a closure is made to migrate to the heap, all continuations that make reference to that environment must also get updated. This is the reason why it takes so much time: it basically goes through the entire continuation chain, updating all places where the migrated environments might appear. Dybvig notes however that it is unnecessary to be so careful. Multiple copies of the same environment can coexist, provided that assignments to these variables are forbidden. This is true for a pure functional programming style, but Arc obviously has assignment operators. He also mentions however, that it is possible to accommodate such state mutations by means of what he calls boxes, which are used only for those variables for which assignments are done. Make a box for every variable that might conceivably be assigned from a closure, and set that as the value for the variable.
So we would add a box instruction that sets the value of a particular environment variable to a box, and modify the behaviour of the lde/ste instructions so that loads from a box result in the value referenced by the box getting loaded, and an ste to a box location result in the value getting written to the box. A box type would essentially be completely invisible to the rest of the system: it will never be seen outside of the environment, and only the GC and the virtual machine need know about them.
This has implications for C code that wants to use the same techniques though (e.g. the use of arc_dynamic_wind in arc_eval and our implementation of call/cc). Needs some thought on how this could be done.
Another thing about these boxes is that they remind me of a discussion I had with Pauan a few weeks ago on the Arc forum when proposing the design of a module system. Hyper-static scope was mentioned in this context, and apparently these same boxes that are pointers to other values are used as well. Perhaps they’ll wind up getting exposed as data types in their own right in the future as well, if we incorporate some of the ideas that came out of that discussion into Arcueid.