Big bag of pages
Implementing the big bag of pages memory allocator proved to be not all that difficult. Arcueid still uses malloc to allocate objects greater than 512 bytes in size, but smaller objects are allocated using the big bag of pages. A simple free list is maintained for each of the sizes from 1 to 512 bytes, and if the free list for a particular size is found empty, a new chunk of memory (marked immutable) is allocated that should be big enough to hold up to 64 (by default) objects of that size. This chunk is carved out into objects of the requested size and added to the free list.
This approach has shown a large performance increase. Running the ab test as below produces the following results:
$ ab -n 100 -c 10 http://localhost:8080/said This is ApacheBench, Version 2.3 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking localhost (be patient).....done Server Software: Server Hostname: localhost Server Port: 8080 Document Path: /said Document Length: 166 bytes Concurrency Level: 10 Time taken for tests: 5.282 seconds Complete requests: 100 Failed requests: 0 Write errors: 0 Total transferred: 24000 bytes HTML transferred: 16600 bytes Requests per second: 18.93 [#/sec] (mean) Time per request: 528.151 [ms] (mean) Time per request: 52.815 [ms] (mean, across all concurrent requests) Transfer rate: 4.44 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.1 0 1 Processing: 440 522 99.9 497 857 Waiting: 408 489 95.8 464 808 Total: 440 523 100.0 497 857 Percentage of the requests served within a certain time (ms) 50% 497 66% 508 75% 518 80% 522 90% 615 95% 850 98% 856 99% 857 100% 857 (longest request)
This is almost twice the speed , showing that memory allocations are a large factor dominating performance: obviously the simple free list that is now used for objects in our big bag of pages is much, much faster than malloc. This has also reduced total memory usage somewhat, though not that much, using 37 megs on initial load of arc.arc, 150 megs after loading libs.arc, and 253 megs after the above test, vs. 41, 170, and 283 respectively before implementing BiBOP.
One thing that the current BiBOP implementation cannot do is release memory back to the OS. Trying to make a shot at doing this will in any case make the free list not such a simple free list any more, and hence slow down memory allocations as well. We’ll probably have to use mmap instead of malloc to get such memory blocks, and the block size should be made larger than it is today, and some algorithms need to be devised for managing the multiple free lists efficiently.
One other thing: we are getting very close to being able to run news.arc! Some bug fixes involving ssyntax (available in git head) now allow news.arc to finally load correctly. However, there still seem to be other bugs though that prevent it from running properly, that’s also being fixed.