Changing garbage collectors
Well, it seems that replacing the old garbage collector with VCGC improved matters markedly. The results of ApacheBench with the new garbage collector are as follows:
$ ab -n 100 -c 1 http://localhost:8080/news
This is ApacheBench, Version 2.3
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient).....done
Server Software:
Server Hostname: localhost
Server Port: 8080
Document Path: /news
Document Length: 2260 bytes
Concurrency Level: 1
Time taken for tests: 8.271 seconds
Complete requests: 100
Failed requests: 0
Write errors: 0
Total transferred: 233400 bytes
HTML transferred: 226000 bytes
Requests per second: 12.09 [#/sec] (mean)
Time per request: 82.710 [ms] (mean)
Time per request: 82.710 [ms] (mean, across all concurrent requests)
Transfer rate: 27.56 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 72 83 20.6 76 193
Waiting: 41 49 14.9 44 134
Total: 72 83 20.6 76 193
Percentage of the requests served within a certain time (ms)
50% 76
66% 77
75% 79
80% 83
90% 98
95% 132
98% 169
99% 193
100% 193 (longest request)
This is a rather major improvement I should think. Using VCGC improved matters by a factor of nearly 3 on average, and is only 1.6 times slower than disabling garbage collection completely. It’s still a long way from Anarki which can still get 1.865 ms on average on my machine.
Here’s what the profiler finds:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
10.07 2.24 2.24 3185 0.00 0.00 gc
8.34 4.10 1.86 144932952 0.00 0.00 TYPE
6.97 5.65 1.55 57782583 0.00 0.00 envval
4.63 6.68 1.03 88860737 0.00 0.00 nextcont
4.00 7.57 0.89 83769798 0.00 0.00 TYPE
3.69 8.39 0.82 89950076 0.00 0.00 contenv
3.53 9.17 0.79 69084312 0.00 0.00 mark
3.33 9.91 0.74 6093257 0.00 0.00 arc_hash_final
3.19 10.62 0.71 188313998 0.00 0.00 TYPE
3.06 11.30 0.68 72275198 0.00 0.00 MARKPROP
2.83 11.93 0.63 1129927 0.00 0.00 __arc_update_cont_envs
2.38 12.46 0.53 7127191 0.00 0.00 bibop_alloc
1.80 12.86 0.40 15619398 0.00 0.00 arc_hash_update
1.75 13.25 0.39 434522 0.00 0.00 __arc_vector_marker
1.66 13.62 0.37 40391647 0.00 0.00 __arc_getenv
1.53 13.96 0.34 71091716 0.00 0.00 TENVR
1.53 14.30 0.34 1436710 0.00 0.00 __arc_vmengine
1.39 14.61 0.31 72274982 0.00 0.00 __arc_wb
The garbage collector is still the largest component, but its proportion has gone down from nearly 30% from the previous version to only 10%. The next largest contributor is the TYPE function, which is supposed to be inlined. It seems the system is spending 8% of its time doing type dispatching. It makes 145 million such calls when attempting to serve the news page 100 times, so anything we can do to either reduce the number of times this is called or make it run faster will improve matters greatly.
However, I think this is a battle we can fight after the next release, and before then I’ll work on adding some small improvements as well, such as command line parameters, load paths, additional functions, and so forth.