Memory fragmentation and malloc

•2012-02-14 • Leave a Comment

The resident set size of the Arcueid process on my machine, after it loads arc.arc, clocks in at 99 megabytes, and uses 124 megabytes of virtual memory. This is higher than I expected, when attempting to traverse the root set and all allocated memory as kept track of by the garbage collector only amounts to less than 10 megabytes. Since at the moment we use malloc and free to do all our memory allocation and freeing, we are probably getting lots and lots of fragmentation, which became obvious after looking at the memory map of an Arcueid process. Well, now I learn the hard way why no real high-level language runtime does things that way! I used to have some block memory management code before where I would use malloc only to allocate large blocks of memory (memory pools), which I would then free only if they ever became empty, but debugging this was dreary work so I decided to just use malloc/free for the time being. I knew that fragmentation would become a problem but I had not realized just how badly it would affect an application such as this. In hindsight it must be obvious, given that we are allocating thousands of small objects like cons cells just to load and compile arc.arc, some of which become garbage and others persist indefinitely.

I will thus have to write some form of manual memory allocator that carves out large blocks of memory using malloc/mmap/etc., and then parcels portions of this out to programs as they need them, maintaining a free list to manage allocations. One thing I can do is maintain one set of memory pools for all cell structs (which are the fundamental building block for all of Arcueid’s data types), and another pool for everything else, e.g. immutable memory for things like hash tables, irregularly sized cell structs for things like vectors, etc. This is a simplified version of the big bag of pages that recognizes only two basic objects: cells and non-cells.

There seem to be many different memory allocation algorithms out there of varying degrees of complexity, but for the time being let’s just keep things simple, and maintain just a basic free list sorted by ascending addresses in a block. We’ll optimize the cell case by maintaining separate memory pools for regular cells as they are the most common form of memory allocation. Expect to see this in version 0.0.11.

Tail recursion

•2012-02-11 • 1 Comment

The current code in the Git repository today is actually capable of running srv.arc properly, and can now run the Arc challenge code more or less stably. There are a couple of serious problems though: 1. it’s dog slow, and 2. it uses too much memory. Using ab to make 100 accesses at a concurrency level of 10 produces the following dismal results:

$ ab -n 100 -c 10 http://localhost:8080/said
This is ApacheBench, Version 2.3 
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done

Server Software:        
Server Hostname:        localhost
Server Port:            8080

Document Path:          /said
Document Length:        166 bytes

Concurrency Level:      10
Time taken for tests:   10.263 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      24000 bytes
HTML transferred:       16600 bytes
Requests per second:    9.74 [#/sec] (mean)
Time per request:       1026.313 [ms] (mean)
Time per request:       102.631 [ms] (mean, across all concurrent requests)
Transfer rate:          2.28 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       0
Processing:   943 1021  43.3   1015    1170
Waiting:      838  933  47.3    927    1102
Total:        943 1021  43.2   1015    1170

Percentage of the requests served within a certain time (ms)
  50%   1015
  66%   1033
  75%   1042
  80%   1048
  90%   1064
  95%   1125
  98%   1166
  99%   1170
 100%   1170 (longest request)

Reference Arc is at least an order of magnitude faster. And meanwhile the Arcueid process has balooned to 1.5 gigabytes RSS. I think I know part of the reason why this is happening: Arcueid does not yet do proper tail recursion. There are large numbers of loops inside of srv.arc that make use of tail recursion, and their repeated runs make useless copies of their environments accumulate over time.

I just ran across a paper by Richard A. Kelsey (the author of Scheme48 among other things), that addresses precisely the implementation issue here. There are three methods described in Kelsey’s paper to address this: 1. Place environments in the heap instead of a stack and trust the real garbage collector to take care of business, 2. Overwrite environments on tail recursion, and 3. stack garbage collection.

The first method described in Kelsey’s paper, based on an idea by Andrew Appel, is also the simplest to  implement. It is also the slowest, based on Kelsey’s benchmarks. The virtual machine need only stop making new continuations for every tail call in order to do this. There are some complications involved here though, because error handling depends on the existence of a current continuation (and a tail call may not have one), but that should not be hard to address.

The second method does not seem that hard to implement either. What has to be done in this case is just not to make new environments on tail recursive calls. The old environment will still be used, and a new one will not be allocated. This can be done by skipping over the env instruction that creates a new environment on a tail recursive call or modifying the env instruction to detect tail recursion.

The third method may be something we can consider implementing later on, but given the results in Kelsey’s paper it does not seem to be worth the additional complication, as it is not that much faster than the second method.

Symbols and symbol garbage collection: Arcueid vs. Ruby

•2012-02-10 • Leave a Comment

I don’t really know how other Lisp implementations represent symbols, but I am somewhat familiar with how Ruby does it, having studied the source code a while back when I was hoping to be able to contribute to the efforts underway some years ago to improve Matz’s Ruby Interpreter (MRI), that later became the Ruby 1.9 series. Alas, my attempts to break into hacking the interpreter never got anywhere as I got tangled in my other duties, but I digress.

Anyhow, one thing that Arcueid has to do that Ruby doesn’t is symbol garbage collection. Symbols in Ruby are known as a possible source of memory leakage, as such creating large numbers of symbols at runtime is a practice frowned upon in Ruby programming. However, creating large numbers of symbols at runtime is standard Arc programming practice, with functions like uniq that are widely used in macro programming. I thus realized that not having symbol garbage collection was unacceptable for any Arc interpreter, so I had to take a shot at doing it.

Arcueid, influenced by MRI, represents symbols as what are essentially small-ish integers. There is a lastsym entry in the arc struct that is incremented every time arc_intern is used to intern a new symbol that has never been interned before. This becomes the symbol’s identifier (ID), and it is stored into two hash tables, which are fields symtable and rsymtable in the arc struct. The symtable is indexed by the names of symbols, and gives their ID’s, while the rsymtable is indexed by symbol IDs and gives their names. The symbol ID is converted into an actual symbol value by the macro ID2SYM (this macro is also present in MRI and does more or less the same thing), and a SYM2ID macro does the reverse when symbols are being manipulated. The arc_intern function does all this, looking for the symbol name in the symtable, returning its value, otherwise installing the new symbol in the symbol tables. This has the nice effect of making symbols cheap to compare against each other, as they are immediate values just like fixnums, and takes care of making symbols unique in the interpreter. However, I realized soon enough that this approach made garbage collecting symbols somewhat complicated, and this was probably the reason why MRI doesn’t do it at all. I nevertheless came up with a solution, which proved not to be all that complicated, but I had stumbled several times trying to get it to work right in all cases.

Anyway, my approach to garbage collecting symbols represented in this way involves treating the symbol tables specially. MRI makes the symbol tables part of the garbage collector root set, which obviously makes garbage collecting symbols impossible. What Arcueid does is something less than that. It marks the symbol table objects as immutable, and as such, they can never be garbage collected themselves. The buckets inside the symbol tables, which represent the actual symbol data, are not immutable though, and a bucket in the symbol tables must be referenced by some other object reachable from the root set, or else it will not be marked and will wind up getting freed by the sweeper. When the GC marker sees a symbol it will then look for the symbol’s buckets in symtable and rsymtable, and will mark them. Symbols which are not seen in this way will not have their buckets in the symbol tables marked, and they will be removed by the sweeper, which has to remove the entries in the slots of the symbol tables they used also. I ran into a number of pitfalls here but I think I’ve corrected most all of them in the code in Git today.

Well, when a symbol that has been garbage collected is created again somehow, it will obviously wind up with a different actual ID (a new one) from when it was created the first time. But I will ask why this should matter, because when it got garbage collected the first time no part of the garbage collection root set knew about its existence. But if it matters (and I’m not convinced that it does) I can think of a very easy way to get around this. Compute a hash value based on the string representation of the symbol with some suitable hash function (I use this hash function in Arcueid), and use that hash value as its ID rather than having a monotonically incrementing ID value as above. Hmm, I may actually use this approach in the future as it may obviate the need for having a reverse and forward symbol table. Something to consider.

I don’t see any reason why Ruby cannot use this approach as well. The only problem may be that Ruby does not treat hash table buckets as separate objects in their own right, allowing them to be garbage collected on their own. I had to write my own hash table code to do this explicitly.

Arcueid 0.0.9 released

•2012-02-09 • Leave a Comment

Arcueid 0.0.9 has been released. You can download it here.

Release notes for Version 0.0.9:

  • Bug fixes for destructuring binds and hash applications
  • Fixes for some functions (close, setuid, flushout, etc.)
  • dynamic-wind introduced as a primitive
  • Continuation mark mechanism (call-w/cmark) introduced
  • call-w/std(in|out) redefined in terms of continuation marks
  • File I/O with multiple threads active fixed

It can almost run srv.arc but there are still some issues here and there.

Continuation marks in Arcueid

•2012-02-08 • Leave a Comment

In order to generalize the notion of call-w/std(in|out), Arcueid provides a more general call-w/cmark macro that allows one to bind arbitrary continuation marks when a thunk is called. Since Arcueid’s built-in stdin, stdout, and stderr functions look for continuation marks named stdin-fd, stdout-fd, and stderr-fd respectively, using call-w/cmark with these continuation marks suffices to implement them. If no marks are bound to these names, then the global variables with these values are used, these being normally bound to stdin, stdout, and stderr ports that are created when the system initializes.

Arcueid’s implementation of continuation marks is very simple. Three functions are provided:

  1. cmark – retrieves the binding of an existing continuation mark if one is available.
  2. scmark – binds a continuation mark, covering the previous value if one exists
  3. ccmark – unbinds a continuation mark, restoring the previous value if one exists

The scmark and ccmark functions should never be used outside of the provided call-w/cmark macro. This itself is a wrapper around dynamic-wind, ensuring that the continuation marks created for a thunk are bound when the thunk is entered, and unbound when the thunk is exited. The definition is very simple:

(mac call-w/cmark (thunk . cmarks)
  `(dynamic-wind (fn () ,@(map1 [list 'scmark car._ cadr._] (pair cmarks)))
		 ,thunk
		 (fn () ,@(map1 [list 'ccmark car._] (pair cmarks)))))

As one can see, it is just a call to dynamic-wind. The before thunk uses scmark to bind all the specified continuation marks, and the after thunk uses ccmark to unbind all continuation marks.

Every thread has a hash table that stores these continuation marks. New threads receive a flattened copy of their parent’s continuation marks as they existed at the time the parent spawned it. Values bound to a thread’s continuation mark table in this way cannot be removed by the child thread.

It looks like dynamic-wind is seeing much more use as a primitive than expected!

No other C implementations of Arc?

•2012-02-08 • Leave a Comment

I would have thought that someone else would have tried to make a C implementation of Arc too, given how cumbersome installing and using Arc has been, with all the name changes and incompatibilities introduced by MzScheme/PLT Scheme/Racket/whatever the hell they’re calling themselves these days. All the popular scripting languages have had C implementations and as such could be immediately installed without any other dependencies other than that of the base system. Given that C is practically the lingua franca of modern computing, I would have thought that such an implementation would have had great importance. If such a thing existed earlier, Arc might well have become much more popular than it is today.

I had to stop working on the project for more than three years as I had too many other things to do that paid the bills, so the code base lay fallow for some time. When I decided to work on it seriously again late last year, I tried to look for other projects that have tried to do what I do, and found nothing. Today, googling for “Arc implemented in C” gets my original post on the Arc Forum announcing Arcueid  as the third hit (the top two being irrelevant references to automatic reference counting, or ARC, in Objective-C programming for iOS).

Well, I suppose I just have to soldier on, and hope that the project attracts some more attention. Arcueid 0.0.8 is in more or less a usable state, and any bug reports and patches are very much welcome.

Exceptions with the Hanson-Lamping algorithm

•2012-02-08 • Leave a Comment

The revision previously mentioned about using the Hanson-Lamping algorithm has to take exceptions into account obviously. Exception handlers registered through on-err would reside on their own stack, the elements of which consisting of vectors with the following entries:

  1. The closure of the exception handler itself
  2. The current continuation for the call to on-err
  3. The current value of the here stack (TCH) of the current thread

Exceptions are raised by checking the top of the exception handler stack. If it is empty, the default exception handling behavior is executed. Otherwise, the top stack entry is obtained, and the following occurs:

  1. The TCH is rerooted to the saved value of the TCH preserved when the exception handler was created
  2. The functions thus obtained are executed, so all functions registered by intervening protect or dynamic-wind invocations get executed.
  3. The exception handler is removed from the top of the stack.
  4. The saved continuation (which was the continuation created by the call to on-err) becomes the current continuation.
  5. The exception handler is executed, with the exception that was raised as its argument

It seems that reference Arc handles the case of a protect clause raising its own error by catching the exception thus thrown, so the original exception is lost. That’s what this algorithm would do, if we do not remove the exception handler from the stack of exception handlers until after all the protect clauses are executed.

Revising Arcueid’s implementation of protect

•2012-02-07 • Leave a Comment

I had a very lively discussion with some of the folks at the Arc Forum when Arcueid first began to make a shot at implementing the protect function (which is Arc’s hook into Scheme’s dynamic-wind). This proved a messy undertaking from the start, and I am not very happy with how it exists in the current release. It works alright for now, but the fact that it is so complicated makes me think that there are corner cases that it doesn’t handle properly. I should begin revising the code to do this to make use of the Hanson-Lamping algorithm discussed there.

While I do so, the implementation of call-w/std(in|out) could use more than a little revision. I should begin using the more general notion of continuation marks and parameters that became the subject of another lively discussion I had on the Arc Forum once Arcueid began implementing those features.

This requires ripping out most of the guts of Arcueid’s continuation structure, simplifying it and removing all the cruft that has accumulated there over time. There’ll be a second branch for this and we’ll see about merging the two later on once the implementation is stable.

Many thanks to rocketnia (Ross Angle), for describing the Hanson-Lamping algorithm so I could understand it.

Arcueid: Introduction

•2012-02-07 • Leave a Comment

Around 2009 or thereabouts I started a project to make an implementation of Paul Graham’s Arc dialect of Lisp, and while I left the code fallow for quite some time and never made any sort of official announcement of it, I plan to change that, and thus I have inflicted my code on the world.

The current release as of this writing is Arcueid 0.0.8 and it implements pretty much all of Arc 3.1, although it is still buggy and some features are not correctly implemented such that Arc 3.1’s srv.arc crashes when it is used.

Well, one of the first things the Arc community asked me was where I got the name Arcueid. Well, I’m a bit of an anime/manga fan and one of my favorites is a show called Shingetsutan Tsukihime (真月譚月姫, in English Tsukihime, Lunar Legend). The main heroine is named Arcueid Brunestud (アルクェイド・ブリュンスタッド) and I thought the name was just too good to pass up for a project such as this. I can only hope that Type Moon doesn’t give me problems about this choice of name later on! Arcueid is pronounced like this. The guy with glasses (Shiki Tohno) says her name.

One of these days I’ll design a logo for Arcueid based on the Arc logo:

And the seal script Han character 弧 (‘hú’ in Pinyin, or ‘ko’ in Japanese on-yomi), which means an arc or arch.

But of course development comes first…