Tail recursion
The current code in the Git repository today is actually capable of running srv.arc properly, and can now run the Arc challenge code more or less stably. There are a couple of serious problems though: 1. it’s dog slow, and 2. it uses too much memory. Using ab to make 100 accesses at a concurrency level of 10 produces the following dismal results:
$ ab -n 100 -c 10 http://localhost:8080/said This is ApacheBench, Version 2.3 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking localhost (be patient).....done Server Software: Server Hostname: localhost Server Port: 8080 Document Path: /said Document Length: 166 bytes Concurrency Level: 10 Time taken for tests: 10.263 seconds Complete requests: 100 Failed requests: 0 Write errors: 0 Total transferred: 24000 bytes HTML transferred: 16600 bytes Requests per second: 9.74 [#/sec] (mean) Time per request: 1026.313 [ms] (mean) Time per request: 102.631 [ms] (mean, across all concurrent requests) Transfer rate: 2.28 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.1 0 0 Processing: 943 1021 43.3 1015 1170 Waiting: 838 933 47.3 927 1102 Total: 943 1021 43.2 1015 1170 Percentage of the requests served within a certain time (ms) 50% 1015 66% 1033 75% 1042 80% 1048 90% 1064 95% 1125 98% 1166 99% 1170 100% 1170 (longest request)
Reference Arc is at least an order of magnitude faster. And meanwhile the Arcueid process has balooned to 1.5 gigabytes RSS. I think I know part of the reason why this is happening: Arcueid does not yet do proper tail recursion. There are large numbers of loops inside of srv.arc that make use of tail recursion, and their repeated runs make useless copies of their environments accumulate over time.
I just ran across a paper by Richard A. Kelsey (the author of Scheme48 among other things), that addresses precisely the implementation issue here. There are three methods described in Kelsey’s paper to address this: 1. Place environments in the heap instead of a stack and trust the real garbage collector to take care of business, 2. Overwrite environments on tail recursion, and 3. stack garbage collection.
The first method described in Kelsey’s paper, based on an idea by Andrew Appel, is also the simplest to implement. It is also the slowest, based on Kelsey’s benchmarks. The virtual machine need only stop making new continuations for every tail call in order to do this. There are some complications involved here though, because error handling depends on the existence of a current continuation (and a tail call may not have one), but that should not be hard to address.
The second method does not seem that hard to implement either. What has to be done in this case is just not to make new environments on tail recursive calls. The old environment will still be used, and a new one will not be allocated. This can be done by skipping over the env
instruction that creates a new environment on a tail recursive call or modifying the env
instruction to detect tail recursion.
The third method may be something we can consider implementing later on, but given the results in Kelsey’s paper it does not seem to be worth the additional complication, as it is not that much faster than the second method.
[…] the virtual size has gone down from over 400 megs to about 170. After using ab to do the same test as below, virtual size is 283 megs, from over 1.6 gigs before. It’s still a lot, but a lot less than […]
Watch the size of your structs… « Arcueid 弧 said this on 2012-02-15 at 16:26 |