Big bag of pages
Implementing the big bag of pages memory allocator proved to be not all that difficult. Arcueid still uses malloc to allocate objects greater than 512 bytes in size, but smaller objects are allocated using the big bag of pages. A simple free list is maintained for each of the sizes from 1 to 512 bytes, and if the free list for a particular size is found empty, a new chunk of memory (marked immutable) is allocated that should be big enough to hold up to 64 (by default) objects of that size. This chunk is carved out into objects of the requested size and added to the free list.
This approach has shown a large performance increase. Running the ab test as below produces the following results:
$ ab -n 100 -c 10 http://localhost:8080/said
This is ApacheBench, Version 2.3
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient).....done
Server Software:
Server Hostname: localhost
Server Port: 8080
Document Path: /said
Document Length: 166 bytes
Concurrency Level: 10
Time taken for tests: 5.282 seconds
Complete requests: 100
Failed requests: 0
Write errors: 0
Total transferred: 24000 bytes
HTML transferred: 16600 bytes
Requests per second: 18.93 [#/sec] (mean)
Time per request: 528.151 [ms] (mean)
Time per request: 52.815 [ms] (mean, across all concurrent requests)
Transfer rate: 4.44 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 1
Processing: 440 522 99.9 497 857
Waiting: 408 489 95.8 464 808
Total: 440 523 100.0 497 857
Percentage of the requests served within a certain time (ms)
50% 497
66% 508
75% 518
80% 522
90% 615
95% 850
98% 856
99% 857
100% 857 (longest request)
This is almost twice the speed , showing that memory allocations are a large factor dominating performance: obviously the simple free list that is now used for objects in our big bag of pages is much, much faster than malloc. This has also reduced total memory usage somewhat, though not that much, using 37 megs on initial load of arc.arc, 150 megs after loading libs.arc, and 253 megs after the above test, vs. 41, 170, and 283 respectively before implementing BiBOP.
One thing that the current BiBOP implementation cannot do is release memory back to the OS. Trying to make a shot at doing this will in any case make the free list not such a simple free list any more, and hence slow down memory allocations as well. We’ll probably have to use mmap instead of malloc to get such memory blocks, and the block size should be made larger than it is today, and some algorithms need to be devised for managing the multiple free lists efficiently.
One other thing: we are getting very close to being able to run news.arc! Some bug fixes involving ssyntax (available in git head) now allow news.arc to finally load correctly. However, there still seem to be other bugs though that prevent it from running properly, that’s also being fixed.
