Tail recursion
The current code in the Git repository today is actually capable of running srv.arc properly, and can now run the Arc challenge code more or less stably. There are a couple of serious problems though: 1. it’s dog slow, and 2. it uses too much memory. Using ab to make 100 accesses at a concurrency level of 10 produces the following dismal results:
$ ab -n 100 -c 10 http://localhost:8080/said
This is ApacheBench, Version 2.3
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient).....done
Server Software:
Server Hostname: localhost
Server Port: 8080
Document Path: /said
Document Length: 166 bytes
Concurrency Level: 10
Time taken for tests: 10.263 seconds
Complete requests: 100
Failed requests: 0
Write errors: 0
Total transferred: 24000 bytes
HTML transferred: 16600 bytes
Requests per second: 9.74 [#/sec] (mean)
Time per request: 1026.313 [ms] (mean)
Time per request: 102.631 [ms] (mean, across all concurrent requests)
Transfer rate: 2.28 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 0
Processing: 943 1021 43.3 1015 1170
Waiting: 838 933 47.3 927 1102
Total: 943 1021 43.2 1015 1170
Percentage of the requests served within a certain time (ms)
50% 1015
66% 1033
75% 1042
80% 1048
90% 1064
95% 1125
98% 1166
99% 1170
100% 1170 (longest request)
Reference Arc is at least an order of magnitude faster. And meanwhile the Arcueid process has balooned to 1.5 gigabytes RSS. I think I know part of the reason why this is happening: Arcueid does not yet do proper tail recursion. There are large numbers of loops inside of srv.arc that make use of tail recursion, and their repeated runs make useless copies of their environments accumulate over time.
I just ran across a paper by Richard A. Kelsey (the author of Scheme48 among other things), that addresses precisely the implementation issue here. There are three methods described in Kelsey’s paper to address this: 1. Place environments in the heap instead of a stack and trust the real garbage collector to take care of business, 2. Overwrite environments on tail recursion, and 3. stack garbage collection.
The first method described in Kelsey’s paper, based on an idea by Andrew Appel, is also the simplest to implement. It is also the slowest, based on Kelsey’s benchmarks. The virtual machine need only stop making new continuations for every tail call in order to do this. There are some complications involved here though, because error handling depends on the existence of a current continuation (and a tail call may not have one), but that should not be hard to address.
The second method does not seem that hard to implement either. What has to be done in this case is just not to make new environments on tail recursive calls. The old environment will still be used, and a new one will not be allocated. This can be done by skipping over the env instruction that creates a new environment on a tail recursive call or modifying the env instruction to detect tail recursion.
The third method may be something we can consider implementing later on, but given the results in Kelsey’s paper it does not seem to be worth the additional complication, as it is not that much faster than the second method.

[…] the virtual size has gone down from over 400 megs to about 170. After using ab to do the same test as below, virtual size is 283 megs, from over 1.6 gigs before. It’s still a lot, but a lot less than […]
Watch the size of your structs… « Arcueid 弧 said this on 2012-02-15 at 16:26 |