Arcueid 0.1.1 released

•2013-04-25 • Leave a Comment

Well, that was a bit fast. Arcueid 0.1.1 is released. You can download it here.

This is just a bugfix release, but it is a rather significant one, as this is the first version of Arcueid that can actually run news.arc completely. There were several subtle bugs in 0.1.0 that caused it to fail to run news.arc correctly before, most notably involving the way that it handles ssyntax and its treatment of the andf operator. These have been fixed and now news.arc can run properly.

It is sorely lacking in performance however, and it’s probably unstable, but I hope to address all of those in forthcoming releases. The current version gives the following timings when running news.arc with ApacheBench:


Concurrency Level:      1
Time taken for tests:   23.292 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      233500 bytes
HTML transferred:       226100 bytes
Requests per second:    4.29 [#/sec] (mean)
Time per request:       232.915 [ms] (mean)
Time per request:       232.915 [ms] (mean, across all concurrent requests)
Transfer rate:          9.79 [Kbytes/sec] received

Connection Times (ms)
min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   166  233  34.6    227     372
Waiting:      108  207  34.4    200     332
Total:        166  233  34.6    227     372

Percentage of the requests served within a certain time (ms)
50%    227
66%    242
75%    251
80%    252
90%    281
95%    308
98%    349
99%    372
100%    372 (longest request)

Rather dismal I’m sorry to say, but a large part of that seems to be the garbage collector, which is run much too aggressively, and turning off the garbage collector increases performance rather dramatically:


Concurrency Level: 1
Time taken for tests: 5.162 seconds
Complete requests: 100
Failed requests: 0
Write errors: 0
Total transferred: 233500 bytes
HTML transferred: 226100 bytes
Requests per second: 19.37 [#/sec] (mean)
Time per request: 51.624 [ms] (mean)
Time per request: 51.624 [ms] (mean, across all concurrent requests)
Transfer rate: 44.17 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 44 52 7.1 50 102
Waiting: 22 24 2.3 24 37
Total: 44 52 7.1 50 103

Percentage of the requests served within a certain time (ms)
50% 50
66% 51
75% 52
80% 52
90% 58
95% 66
98% 72
99% 103
100% 103 (longest request)

I’ll be working on improving garbage collection performance and other improvements over the next few releases, as well as addressing some stability issues observed over the testing I’ve been doing.

Arcueid 0.1.0 released

•2013-04-23 • Leave a Comment

Arcueid 0.1.0 is released. You can download it here.

Version 0.1.0 is an almost complete rewrite and incorporates too many changes to list, but we do have the following notable ones:

  • Better performance (I think) because we now avoid copying stuff from the stack into the heap unless absolutely necessary
  • C foreign function interface revised
  • Readline persistent history (although it has its problems that I can’t seem to trace just yet)
  • 32-bit support now seems to be working fine

It can sorta run news.arc but it still has some problems, most notably that it’s not possible to post URLs as stories.

Status update: imminent release

•2013-04-23 • Leave a Comment

Well, after a few more days of work. Arcueid 0.1.0 is very nearly ready for release. You can check out the experimental branch from Github if you like and see how it runs. It’s now somewhat usable and more stable, and is definitely faster than the 0.0.x series. It can almost run news.arc properly, although there are some problems (e.g. posting URLs in stories causes it to crash with a failed apply error), and it only passes 291 of the 313 Conan Dalton unit tests, not counting the ones involving the double-bar symbol tests that reference Arc inadvertently inherited from MzScheme/Racket, which I have no intention of ever implementing in full generality.

I’ll do a little cleaning up before producing the final 0.1.0 release in a few days or so.

The Garbage Man’s Blues

•2013-04-19 • Leave a Comment

Arcueid 0.1.0 is now almost complete. The only stuff missing now before I can get it to run news.arc might be networking code, but that shouldn’t be difficult to do. I had, however, gotten into some trouble with garbage collection, which is a painful business. I’m starting off with a basic mark and sweep garbage collector, and now I sort of regret having removed the old write barrier code that used to be part of my original code, as it could have been used not just to support VCGC that we tried to use in the 0.0.1 series. It could have been used to do a lot of interesting things.

First of all, a write barrier would have enabled us to defer actually allocating an object until its reference was actually put somewhere, like in an environment, in a register, as part of a data structure, or on the stack, permitting us to implement essentially what amounts to a 1-bit reference count. Of course, that also means that the free list for BiBOP objects has to be implemented as a queue, or something of that nature. If a value is allocated from within a foreign function and is never stored in a variable, then the allocation technically uses no memory. Since we use __arc_mkaff everywhere to call foreign functions from C code, that produces a small allocation of something that almost always immediately turns into garbage. Many other similar scenarios arise in the interpreter for values that are essentially ephemeral in the same way, so many small objects may never need to be truly “allocated” at all, as they become garbage almost instantly.

The garbage collector also seems to be slowing the current code down significantly the way we use it now. Essentially, we perform a collection every time a cycle of all other threads has finished executing, with a default quantum of 1024 instructions, and as such it slows down the code noticeably. I’ve been playing around with the Ackermann function as a benchmark, and for up to (ack 3 4) it seems to be comparable in speed to reference Arc on Racket. However, for (ack 3 5) it already pauses noticeably on my machine, which I think may be due to the garbage collector, given that increasing the quantum to 65,536 produces a marked improvement, as GC is performed less often with such a high quantum. Profiler proof is something I’m going to try to get as well.

If I’m going to stick with the mark and sweep gc rather than going back to VCGC, I wonder under what situations it is best for the collector to run. As I said we do it now after a thread’s time slice completes, but this is not the only way it can be done. Ruby seems to invoke its GC every time the free lists become empty, and perhaps we can do something like that too: run the GC before creating a new BiBOP page or something like that. We could measure how much memory usage we have and set thresholds, and perform collections when such thresholds are reached.

I’ve tried to design the system so as to make garbage collection as abstracted as possible, so that we can swap out GC algorithms as easily as possible. The only thing missing for now is a write barrier mechanism, and that needs to be used whenever value writes are performed to anything that could potentially be reachable from the root set, viz.:

  1. Writes to registers that can contain Arcueid values, e.g. to the accumulator
  2. Writes to variables in an environment.
  3. scar or scdr operations
  4. Writes to vector indices (including writes to hash tables)

This should be simple enough to do without making too many mistakes, by changing the macros that have hitherto been used to do these things to stop producing C lvalues, so that we get an error on compilation like ‘lvalue required as left operand of assignment’. This tells us where to change them so we can put in a function or macro that incorporates the write barrier. This also suffices to provide hooks where we can put in a read barrier if a GC algorithm needs it.

As it is the compiler has at least 74 places where this needs to be updated… Now that I’ve gotten into progress making these changes it’s gradually becoming obvious that the code should not be written in C. While I’ve done many clever things with the C preprocessor to support all of this business I think I’ve just about reached the limits of what’s possible. The use of the AV(…) macro to access variables in the function’s local environment is already error-prone as it is, and now I have a WV(…) macro to write to variables like this. The internal use of an unsigned char type for these variables has gone some way to making this less error-prone (as the compiler may or may not complain about use of such a type in contexts where a value is required), but I really do think that some other language halfway between C and Arc is needed. Perhaps a PreArc analogous to Richard A. Kelsey’s PreScheme? Something to consider.

Maybe I’ll call it Shiki?

Quick status update

•2013-04-17 • Leave a Comment

Well, it’s been seventeen days or so since I embarked upon this project, and now the core of the experimental branch is able to load and run all of arc.arc and it passes the old unit tests that I created for the master branch way back when. Next step is to build a new thread dispatcher and REPL then we’ll have a 0.1.0 release. Now we’ll have no more nonsense about having oddly built continuations done by copying the C stack, and have more nonsense about a large number of functions using Duff’s device to provide what amounts to continuations on a trampoline. XD

On rewriting from scratch

•2013-04-01 • Leave a Comment

As one might surmise from reading my recent posts, as well as watching the experimental branch of Arcueid on github, I’m in the process of furiously rewriting most of Arcueid from scratch in order to address some of the most serious architectural deficiencies that I encountered when making the 0.0.x series. Joel Spolsky famously warned us that rewriting from scratch is something one should never do, but I do think that there are good reasons to do it. A page I found gives some good and bad reasons for the rewriting from scratch, and I’d like to think I’ve embarked on this task of rewriting nearly from scratch for many of the good reasons and not too many of the bad. The original 0.0.x code base suffered from several rather serious architectural problems that were fundamental, as I tried to get something that would more or less work:

  1. To avoid using the oddball foreign function interface that I wound up inventing that made use of ideas from Simon Tatham’s C coroutines, there was a fair amount of code that wound up making use of an alternate mechanism for continuations that involved copying the entire C stack before doing a setjmp and then making a longjmp, and this wound up getting used throughout most of the I/O functions. This did not play well with normal continuations and things got hairy. I was actually thinking of using this copying technique (along with local variables produced by alloca), but I had no idea how this would work with garbage collection, and so ran with a more streamlined version of the C coroutines trick.
  2. Boxed types used unions that grew in size as we layered stuff on top of it. This was a bad idea in retrospect but it made things simple in the beginning.
  3. There was no real tail call optimisation done at all. This is not absolutely required by Arc, but given how most Arc code is like Scheme in that it implements iteration by tail recursion, it is rather important.

There are several more issues with the current code like this that fixing them would involve rewriting nearly the entire code base. There are some places where using the old code was possible, and being lazy I tried to do that as much as I could, some other places where the original algorithms could be adapted to the underlying framework (semi-rewrite from scratch), and other places where this simply wasn’t possible and completely new code had to be written.

Anyhow, I think the core interpreter and the byte compiler are now complete, although the core built-in functions still aren’t complete and as such it isn’t yet ready for use (I knew that coerce was a complicated function, and rewriting it along with a complete suite of unit tests is proving to be a real chore). I hope we’ll have a new 0.1.0 release incorporating the new code before the end of the month.

Tail Recursive Stack Disciplines for an Interpreter: Arc Version

•2013-03-19 • 1 Comment

As I mentioned previously I have begun rewriting the interpreter core to address some of the deficiencies I had encountered in the previous architecture, most notably how the C interface fits in, as this seems like a rather fundamental deficiency. I am presently reworking the core interpreter to fit into the new architecture, and while I’m at it, I might as well begin making some optimisations that I should have tried to do.

I ran across a paper by Richard A. Kelsey: Tail-Recursive Stack Disciplines for an Interpreter, where he describes some techniques for managing continuations and such on the stack. Essentially, what previous versions of Arcueid have done is essentially store all environments and continuations on the heap, as Appel suggests. This seems like the most logical thing to do, because Arc’s parameters are substantially more complicated than Scheme’s or those of other Lisp dialects. There are optional parameters, rest parameters, and most complicated of all, destructuring binds. Arcueid’s compiler essentially copies all parameters off the stack and into the heap, performing destructuring binds and the evaluation of optional arguments as they are required, so that parameter access is as simple as possible. However, this is slow because every parameter is copied from the stack to the heap, and tail calls, such as they are, produce large quantities of garbage that might not be immediately collected.

What we can try to do, however, is use the stack itself to store environments, as the values were already put there by the caller to begin with. Of course, optional arguments, rest parameters, and destructuring binds still need to be handled, and there are a number of approaches we can take to handle this in our interpreter.

The env instruction that creates environments can instead be a three-parameter instruction that takes the number of flat arguments that should be present, the number of optional arguments after the last one, and the number of additional arguments that are to be bound to the destructuring binds. The instruction does the following things:

  1. The first parameter determines the minimum portion of the stack that becomes part of the environment, with destructuring bind arguments being counted as only one argument for these purposes. These are the minimum number of parameters that can be passed to the function.
  2. The second parameter determines how many destructuring bind parameters are added.
  3. The third parameter determines how many optional parameters can be expected from the stack.

The env instruction thus sets up the stack to hold an environment that includes all parameters that have been pushed on the stack, determines which optional parameters were unbound, and adds extra space for destructuring bind arguments. For example, for this rather complicated parameter list:

(fn (a (b c) d (o e d)) ... )

The env instruction would be env 3 2 1, making an environment that is six elements in size. The environment entries would thus be:

3b

0 a
1 (destructuring bind)
2 d
4 c
5 e (may be unbound)

Additional instructions to destructure environment slot 1 and bind the results to envronment slots 3 and 4 then follow, as will an instruction to check if environment slot 5 is already bound to some value. If not, instructions to compute the default value are available.

Tenatively the above parameters should compile to something like:

env 3 2 1
; Destructuring bind instructions
lde 0 1
car
ste 0 3
lde 0 1
cdr
car
ste 0 4
; Optional arguments
lde 0 5
jbnd L1
lde 0 2
L1: ...

The new instruction jbnd (jump if bound) is used to jump if the value register is not the value CUNBOUND.

The envr instruction is used for parameters that look like this:

(fn (a (b c) d (o e d) . f) ... )

That would be envr 3 1 2, making an environment that is seven elements in size.

0 a
1 (destructuring bind)
2 d
3 e (may be unbound)
4 b
5 c
6 f (rest)

The rest parameters are bound up by the envr instruction so with the sole exception of envr replacing env as needed.

What happens when a closure is returned, like say for a higher-order function, or when call/cc creates a continuation? There are several implementation strategies that might work, and there are strategies such as spaghetti stacks as well as perhaps copying the whole stack, which might be more expensive if ccc is used a lot but perhaps not so much in actual practice. We’ll see just how all of this fits together.

A module system for Arc

•2013-03-04 • Leave a Comment

One of the things that made Perl as popular as it is today, I think, was the ability to extend the language and build third-party libraries easily and minimise conflict with each other. This is something that Arc as it exists today presently lacks, and something that it should eventually have. I suppose Paul Graham left it to evolve naturally, but so far this has not happened to my knowledge.

The existence of a standard module system will also help to ameliorate incompatibility as well: most of the extensions that I’m building into Arcueid should go into their own modules, except for such things as the C99 mathematical functions (sin, cos, etc.), additional network I/O functions (e.g. socket, select, socket-bind), etc.: in general things that are logical completions of functionality that is already in the official Arc reference implementation. Things unique to the Arcueid implementation such as the marshalling mechanism (CIEL), the Limbo-inspired thread communication and synchronisation primitives, and so forth, must all go into their own modules to limit pollution of the global namespace with incompatible symbols as far as possible.

The module mechanism would thus also exist in its own module, but special syntax should be provided to support use of the module system in order to facilitate its use. Module support functions, detailed below, are minimal, and all are defined in the top-level module along with all of the standard definitions.

Free variables are qualified as belonging to some module by the qualify function. The :: ssyntax operator expands to make use of the qualify function, e.g. MyModule::some-function becomes (qualify some-function MyModule). ::some-function qualifies some-function as belonging to the top-level module.

The module form opens a module, and all unqualified definitions within the module form are created in the namespace created by the module. The values for unqualified free variables are found by first looking in the module’s global symbols, and then searching for a binding of the symbol in the parent module, until the top-level module (where all of the standard symbols are found) is reached. All bindings to unqualified free variables are made against the module’s namespace.

The import function basically alters the behaviour of free variable binding resolution in all subsequent code, until the end of the scope where the the import function was used. A free variable is searched for first in the current module, then its parents as before. If it remains unbound, all imported modules are searched for a binding. The import function has no effect on unqualified definitions, which are always done against the current module.

I’m still on the fence as to whether modules must explicitly export their symbols so only exported symbols are visible to anything outside the module. This may serve to hide symbols that should be private to the module.

The module system may be implemented, with the exception of the ssyntax for module qualifiers, without hacking into the Arc core itself, I think, and may be doable using only macros, although it is likely to produce a performance hit.

This is a work in progress of course, and I’d like to hear what ideas the Arc community has for a standard module system for the language.

Arcueid’s foreign function interface

•2013-03-01 • Leave a Comment

One of the big problems I had with the last release of Arcueid last year was with continuations that needed to be created within C code itself. I made use of several troublesome techniques for dealing with this issue, which was the cause of some of the problems the current 0.0.x series code has. I already made use of a technique based on Simon Tatham’s C coroutines to set this up, but it looks like I’m going to have to take the technique further and make more intensive use of it within the core for the new release that I’m working on.

I have written a set of rather ugly macros that hide most of the gory details of these Arcueid Foreign Functions, so it becomes possible to write a foreign function that looks like this:

AFFDEF(doubler, func, a, b)
{
  AVAR(rval);
  AFBEGIN;

  AFCALL(AV(func), AV(a), AV(b));
  AV(rval) = INT2FIX(2 * FIX2INT(AFCRV));
  ARETURN(AV(func));
  AFEND;
}
AFFEND

The definitions for these macros are in arcueid.h in the experimental branch of the code on Github, and probably look extremely strange. The AFBEGIN/AFEND macro pair essentially turns the body of the function into a large switch statement (yes, this is an instance of Duff’s device). Another oddity is that the AFCALL macro, perversely, uses a return statement to do its job! This is because these functions are executed by means of a sort of trampoline. The AFCALL macro basically does what one would expect a Lisp function application to do: create and save a continuation, push arguments on the stack, then jump to the function, by returning to the trampoline with the callee set up for the trampoline to execute. When the called function returns, the trampoline restores the continuation that the foreign function created.

Continuations are created for C functions within the AFCALL macro by using the __LINE__ macro provided by the C preprocessor. This is used as the “offset into the function” as it were, and goes into the continuation as such. It is then used as a case label within the giant switch statement that the AFBEGIN/AFEND macros created. Thus, for the trampoline to restore such a C continuation, it has to call the function again with the offset available for the switch statement to see.

There are a lot of other weird techniques in other macros there as well, such as for AFFDEF(…). That one makes some rather creative use of C’s variadic macros in order to make an Arcueid Foreign Function definition that looks more reasonable. The techniques to do this I are based on answers I found here.

This method of defining foreign functions looks reasonably clean, I should think, and the macros do a good job of hiding the black magic that the foreign function interface has to do. There are also mechanisms for causing the the trampoline to pause execution on a non-ready file descriptor for instance, which caused me a lot of problems in my first attempt.

The care and feeding of cyclic data structures

•2013-02-21 • 1 Comment

I’ve finally managed to find some time to do a little bit more work on Arcueid, and now I’m engaged in a major code reorganisation and simplification that should make things a bit cleaner, and I have a bit of a better idea for how to fix some of the most serious problems in the 0.0.x series of the interpreter: perhaps we’ll have a 0.1.0 series soon. More on that anon.

Anyhow, what this post is really about is something that has been bothering me about the current Arc interpreter for a long time. It is very easy to crash the interpreter by simply doing the following:


arc> (= x '(1 2))
arc> (scdr x x)

This sends arc3 and every version of Anarki I have tried into an infinite loop. Seems like it shouldn’t be too hard to fix, and in the experimental branch of Arcueid I have made attempts to deal with cyclic structures like that one somehow by using a hash table that keeps track of what goes where. The pretty printer algorithm I’m testing for the new version now displays something a bit more reasonable:


arc> (= x '(1 2))
arc> (scdr x x)
(1 . (...))

There are other possibilities on what to display that may be a bit more sophisticated, but I don’t think that they’re all that useful for now. For example, the above structure could be displayed as:


arc> (= x '(1 2))
arc> (scdr x x)
#0=(1 . #0#)

as some Scheme interpreters do. Maybe we’ll eventually do something similar.

It is essentially impossible to use cyclic structures in meaningful ways within Arc at the moment thanks to some rather naïve algorithms even for some fairly simple functions like for ‘iso’. The new version I have in the works does essentially a depth-first search whenever traversing a data structure that might be cyclic, by using a hash table to keep track of what has already been visited.