More performance tuning

•2013-05-15 • Leave a Comment

After doing a few more trials, I noticed that the GC quantum is set far too high, and we are using the full-blown TYPE function to decide whether something is a symbol or not, and the GC actually winds up running a lot, so I made a few minor changes that improved performance slightly. Reducing the GC quantum from 4096 objects checked per cycle to 768 had some effects (and for the final cut in the next formal release I think I’ll be setting the GC quantum by an adaptive algorithm similar to how the Inferno/Dis garbage collector does things). Profiling an ApacheBench of 100 requests on news.arc had the following rather interesting results:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 13.04      2.34     2.34     9911     0.00     0.00  gc
  7.69      3.72     1.38 57807893     0.00     0.00  envval
  5.24      4.66     0.94 71028697     0.00     0.00  MARKPROP
  4.40      5.45     0.79 64935366     0.00     0.00  nextcont
  3.85      6.14     0.69 65820037     0.00     0.00  contenv
  3.65      6.80     0.66  5852362     0.00     0.00  arc_hash_final
  3.09      7.35     0.56 48743506     0.00     0.00  TYPE
  2.90      7.87     0.52 139967108     0.00     0.00  TYPE
  2.56      8.33     0.46  7125944     0.00     0.00  bibop_alloc
  2.29      8.74     0.41 64390760     0.00     0.00  TYPE
  2.23      9.14     0.40  1116526     0.00     0.00  __arc_update_cont_envs
  1.95      9.49     0.35 13605816     0.00     0.00  arc_hash_update
  1.95      9.84     0.35 71028617     0.00     0.00  __arc_wb
  1.87     10.18     0.34  4605836     0.00     0.00  arc_restorecont
  1.78     10.50     0.32  1371627     0.00     0.00  __arc_vmengine
  1.62     10.79     0.29  6200889     0.00     0.00  __arc_mkenv
  1.62     11.08     0.29  5534865     0.00     0.00  hash_lookup

The envval function, which now takes 1.38 seconds of the total run time and was called over 57 million times, is the function that obtains environment values from the stack. Looking further down into the call graph, the following is seen:

                0.12    0.05 4891773/57807893     __arc_menv [39]
                0.30    0.13 12475968/57807893     __arc_putenv [30]
                0.97    0.41 40440152/57807893     __arc_getenv [12]
[10]    11.0    1.38    0.59 57807893         envval [10]
                0.37    0.00 57807893/64390760     TYPE [41]
                0.21    0.00 57807893/71107210     TENVR [54]
                0.00    0.01  551924/3639885     nextenv [113]

40 million of those calls came from __arc_getenv, to read environment variables. The current __arc_putenv/__arc_getenv functions are general and can obtain environments everywhere and do checks for parent environments, so perhaps the next optimisation we can try will be to make special case VM instructions for getting variables from a function’s own local environment (as opposed to its parent environments), and inline functions that do the same. We’ll embed those as inline functions in the virtual machine itself for greater efficiency. Reading and writing function local variables seems to be done quite a lot, and it should be something doable with the greatest efficiency, preferably without even the overhead of a function call.

Arcueid 0.1.2 released

•2013-05-14 • Leave a Comment

Arcueid 0.1.2 is out. You can download it here.

Lots of new features in this release. I’ve finally fixed the nested quasiquotes bug that has been out for more than a year. Readline seems to be working better than it had in 0.1.1. Some bugfixes like memory leak issues and garbage collector problems. We now have gone back to using the Huelsbergen-Winterbottom VCGC algorithm for garbage collection. Large file support is now available even on 32-bit systems, provided bignum support is also available. We have some regular expressions based on the Plan 9/Inferno Regular Expression library (and it is for now similarly limited). The load function is now built into the interpreter and load paths are now supported. And now there is a script mode so Arcueid can be used in a shebang, e.g. it is now possible to run a file with the executable bit set with the following contents

#!/usr/local/bin/arcueid --script
(prn (+ 1 1))

and that does what one expects.

Regular expressions are provided by means of the r/…/ syntax. Only basic stuff like character classes, the Kleene star and plus operators, the ? operator, alternation, and capturing groups are supported. Many Perl/POSIX constructions are still unavailable, but soon enough the regexp support should evolve to support many of the most useful features. Hope to soon be able to add things like counted repetitions, character class abbreviations like \d, and non-capturing groups. To use regexps, they can be applied to strings, e.g.:

arc> (r/(abc)(def)/ "zzzabcdefgh")
(3 ("abcdef" "abc" "def"))

Which returns a list with the position of the match, and a list of all the capture groups, starting with the entire string matched by the regular expression. Returns nil if the regexp failed to match. The =~ macro can be used, which binds $$ to the position in the string the match obtained, and $0 to the whole matched portion, $1 to the first capture, and so on, similar to the way Perl does it, e.g.:

arc> (=~ r/(abc)(def)/ "zzzabcdefgh" (list $1 $2))
("abc" "def")

Load paths can be added by means of the loadpath-add function, which adds a directory to the load path list loadpath*.

Regular expressions progress

•2013-05-09 • Leave a Comment

Looks like I’m going to wind up using the Plan 9 regexp library after all. After reading some very informative articles by Russ Cox (this, this, and this), I think I now understand the internal workings of the Plan 9 regexp library well enough to be able to modify it to first of all work on Arcueid strings, and then add in some of the functionality to make it more useful, though it will still not be entirely compatible with Perl or POSIX regexps.  In particular, I don’t think I ever want to make Arcueid regexps support backreferences, which can only be implemented by the backtracking algorithm. However, I do hope to add in enough support for Perl/POSIX regexp features to make it sufficiently useful for most workaday uses of regexps.

I’ve also changed regexp syntax slightly. A regexp in Arcueid now looks sorta like r/…/. It seems that there’s no easy way to unambiguously parse a regexp in the Perl-like /…/ syntax and at the same time have operators like / for division. At least the r in front makes it unambiguous.

While at the moment Arcueid’s strings are simple arrays of Unicode runes (UCS-4), I have plans to make them something much more sophisticated, by turning them into Ropes. This should also permit much more efficient use on strings of the operations on conses like car and cdr, which is something that Paul Graham has considered.

Regular Expressions in Arc

•2013-05-06 • 1 Comment

Seeing as regular expressions are so useful in my day to day programming in both Perl and Ruby, I think it would be a useful thing to have them built into Arcueid as well, but as I am doing this I wonder at how exactly it should be done.

Obviously, matching a regex should be just as easy as applying the regex, e.g. (/(a?)a?(a?aaa)/ "aaaaaaaaaaaaaaa") should be sufficient, but then what is the value of such an expression?  I suppose it should of course be nil if the match fails, but then what happens if the match succeeds?

There are several ways we could go about it:

  1. Just return an integer with the position in the string at which the match began, and then implicitly bind $1 to be the first captured sub-string, $1 to be the second, and so forth.  These variables are essentially global in scope.  It is how Perl and Ruby do it though.
  2. Make a regex match return a list with the position at which the match began and any sub-string captures (backreferences) that were created.  So you can do something like: (let (pos sa sb sc) (/(a+)(b+)(c+)/ str) ... ). I do think this would be cleaner, and it would be possible I think to write a macro where one could do: (=~ /(a+)(b+)(c+)/ str ... $1 ... $2) where $1, $2, and $3 would be bound as in Perl.

And then there is the matter of a regular expression library.  I’m currently trying to adapt the Plan 9 regular expression library but have found that it doesn’t support a lot of the stuff I’ve come to expect from regular expressions: in particular, it doesn’t even support something as simple as non-capturing groups, i.e. (?: ...).

There are several other libraries I’ve been looking at:

  1. Oniguruma.  Suffers from two issues: first, it’s bigger than the rest of Arcueid at the moment.  Second, it does recursive backtracking and can take exponential time with some regexes.  Try doing /a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaaaaaaaaaaa/ =~ "aaaaaaaaaaaaaaaaaaaaaaaaaaaaa" in Perl or Ruby that uses the same algorithm.  That simple match takes over a minute on my 2.13 GHz Core i3 machine!  The same match is almost instantaneous using the Plan 9 regex library.
  2. libpcre.  Suffers from the same exponential time backtracking problems as Oniguruma.  Is also three-clause BSD, so I can’t use it without changing my license as well.
  3. TRE.  Seems to be better, but I’ll need to pare it down to something more minimal.

Other suggestions on a small (<5000 SLOC), efficient, Unicode-capable regex library are welcome. Text processing is something web apps do a lot of, and one of the basic tools for doing that is regular expressions. This, I think deserves to be baked into the language runtime even more than support for big numbers (although that's something that might become baked in sometime soon too). Arcueid may even get support for taint mode à la Perl after this.

Changing garbage collectors

•2013-05-02 • Leave a Comment

Well, it seems that replacing the old garbage collector with VCGC improved matters markedly. The results of ApacheBench with the new garbage collector are as follows:


$ ab -n 100 -c 1 http://localhost:8080/news
This is ApacheBench, Version 2.3
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done

Server Software:
Server Hostname: localhost
Server Port: 8080

Document Path: /news
Document Length: 2260 bytes

Concurrency Level: 1
Time taken for tests: 8.271 seconds
Complete requests: 100
Failed requests: 0
Write errors: 0
Total transferred: 233400 bytes
HTML transferred: 226000 bytes
Requests per second: 12.09 [#/sec] (mean)
Time per request: 82.710 [ms] (mean)
Time per request: 82.710 [ms] (mean, across all concurrent requests)
Transfer rate: 27.56 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 72 83 20.6 76 193
Waiting: 41 49 14.9 44 134
Total: 72 83 20.6 76 193

Percentage of the requests served within a certain time (ms)
50% 76
66% 77
75% 79
80% 83
90% 98
95% 132
98% 169
99% 193
100% 193 (longest request)

This is a rather major improvement I should think. Using VCGC improved matters by a factor of nearly 3 on average, and is only 1.6 times slower than disabling garbage collection completely. It’s still a long way from Anarki which can still get 1.865 ms on average on my machine.

Here’s what the profiler finds:


Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
10.07 2.24 2.24 3185 0.00 0.00 gc
8.34 4.10 1.86 144932952 0.00 0.00 TYPE
6.97 5.65 1.55 57782583 0.00 0.00 envval
4.63 6.68 1.03 88860737 0.00 0.00 nextcont
4.00 7.57 0.89 83769798 0.00 0.00 TYPE
3.69 8.39 0.82 89950076 0.00 0.00 contenv
3.53 9.17 0.79 69084312 0.00 0.00 mark
3.33 9.91 0.74 6093257 0.00 0.00 arc_hash_final
3.19 10.62 0.71 188313998 0.00 0.00 TYPE
3.06 11.30 0.68 72275198 0.00 0.00 MARKPROP
2.83 11.93 0.63 1129927 0.00 0.00 __arc_update_cont_envs
2.38 12.46 0.53 7127191 0.00 0.00 bibop_alloc
1.80 12.86 0.40 15619398 0.00 0.00 arc_hash_update
1.75 13.25 0.39 434522 0.00 0.00 __arc_vector_marker
1.66 13.62 0.37 40391647 0.00 0.00 __arc_getenv
1.53 13.96 0.34 71091716 0.00 0.00 TENVR
1.53 14.30 0.34 1436710 0.00 0.00 __arc_vmengine
1.39 14.61 0.31 72274982 0.00 0.00 __arc_wb

The garbage collector is still the largest component, but its proportion has gone down from nearly 30% from the previous version to only 10%. The next largest contributor is the TYPE function, which is supposed to be inlined. It seems the system is spending 8% of its time doing type dispatching. It makes 145 million such calls when attempting to serve the news page 100 times, so anything we can do to either reduce the number of times this is called or make it run faster will improve matters greatly.

However, I think this is a battle we can fight after the next release, and before then I’ll work on adding some small improvements as well, such as command line parameters, load paths, additional functions, and so forth.

Profiling Arcueid

•2013-04-30 • Leave a Comment

After compiling Arcueid with profiling information, and running news.arc with ApacheBench to generate 100 requests, the results seem to be rather interesting, although not quite unexpected:


Flat profile:

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
28.50 11.29 11.29 287 0.04 0.09 gc
11.37 15.80 4.51 197333024 0.00 0.00 TYPE
3.91 17.35 1.55 3340957 0.00 0.00 __arc_vector_marker
3.89 18.89 1.54 35946478 0.00 0.00 arc_hash_update
3.57 20.30 1.42 190118632 0.00 0.00 mark
3.36 21.63 1.33 57789411 0.00 0.00 envval
2.85 22.76 1.13 88862762 0.00 0.00 nextcont
2.52 23.76 1.00 9775784 0.00 0.00 arc_hash_final
2.32 24.68 0.92 95741775 0.00 0.00 TYPE
2.09 25.51 0.83 142012842 0.00 0.00 VINDEX
1.92 26.27 0.76 89952347 0.00 0.00 contenv
1.77 26.97 0.70 188319700 0.00 0.00 TYPE
1.74 27.66 0.69 9540002 0.00 0.00 hash_lookup
1.72 28.34 0.68 287 0.00 0.00 free_unused_bibop
1.50 28.94 0.60 11201327 0.00 0.00 cons_marker
1.41 29.50 0.56 1130474 0.00 0.00 __arc_update_cont_envs
1.16 29.96 0.46 7119452 0.00 0.00 bibop_alloc
1.14 30.41 0.45 19305490 0.00 0.00 VINDEX

It seems that almost 30% of the system’s time is being spent in the garbage collector. I suppose the next step really is to make use of a more efficient GC algorithm, or to use what we have today less aggressively.

Nested quasiquotes

•2013-04-26 • Leave a Comment

There has been a long-standing issue within Arcueid: dealing with nested quasiquotes. Apparently reference Arc doesn’t handle this case all too well either, and as such none of the code that I use for compatibility testing like news.arc and others makes use of nested quasiquotes either. However, it seems that macro expansions in general might well result in cases where nested quasiquotes can arise, so it is important that this issue be fixed. Anyhow, in my searches I ran across a link to an implementation of a Common Lisp-style quasiquote as Arc macros. It’s an interesting exercise converting it to C… I suppose you might say that this is an example of a macro in Arcueid’s C code, however, it isn’t precisely implemented that way, but as a hook in the compiler, as this is a fundamental part of Arc’s functionality.

The basic example of a nested quasiquote is the following:

``(+ 1 ,,@(list 2 3) 4)

Current Arc 3.1 and Anarki both produce (+ 1 2 4) when asked to evaluate that. Common Lisp produces (+ 1 2 3 4), which is, I think, what we should be doing. Scheme (Racket) produces an error when presented with the same. Arcueid 0.1.1 and below produce a similar error. Akkartik mentions that this was working in some previous version of reference Arc, though he can’t seem to remember when. I think we should produce the same results as Common Lisp for nested quasiquotes.

 
Follow

Get every new post delivered to your Inbox.