More performance tuning

After doing a few more trials, I noticed that the GC quantum is set far too high, and we are using the full-blown TYPE function to decide whether something is a symbol or not, and the GC actually winds up running a lot, so I made a few minor changes that improved performance slightly. Reducing the GC quantum from 4096 objects checked per cycle to 768 had some effects (and for the final cut in the next formal release I think I’ll be setting the GC quantum by an adaptive algorithm similar to how the Inferno/Dis garbage collector does things). Profiling an ApacheBench of 100 requests on news.arc had the following rather interesting results:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 13.04      2.34     2.34     9911     0.00     0.00  gc
  7.69      3.72     1.38 57807893     0.00     0.00  envval
  5.24      4.66     0.94 71028697     0.00     0.00  MARKPROP
  4.40      5.45     0.79 64935366     0.00     0.00  nextcont
  3.85      6.14     0.69 65820037     0.00     0.00  contenv
  3.65      6.80     0.66  5852362     0.00     0.00  arc_hash_final
  3.09      7.35     0.56 48743506     0.00     0.00  TYPE
  2.90      7.87     0.52 139967108     0.00     0.00  TYPE
  2.56      8.33     0.46  7125944     0.00     0.00  bibop_alloc
  2.29      8.74     0.41 64390760     0.00     0.00  TYPE
  2.23      9.14     0.40  1116526     0.00     0.00  __arc_update_cont_envs
  1.95      9.49     0.35 13605816     0.00     0.00  arc_hash_update
  1.95      9.84     0.35 71028617     0.00     0.00  __arc_wb
  1.87     10.18     0.34  4605836     0.00     0.00  arc_restorecont
  1.78     10.50     0.32  1371627     0.00     0.00  __arc_vmengine
  1.62     10.79     0.29  6200889     0.00     0.00  __arc_mkenv
  1.62     11.08     0.29  5534865     0.00     0.00  hash_lookup

The envval function, which now takes 1.38 seconds of the total run time and was called over 57 million times, is the function that obtains environment values from the stack. Looking further down into the call graph, the following is seen:

                0.12    0.05 4891773/57807893     __arc_menv [39]
                0.30    0.13 12475968/57807893     __arc_putenv [30]
                0.97    0.41 40440152/57807893     __arc_getenv [12]
[10]    11.0    1.38    0.59 57807893         envval [10]
                0.37    0.00 57807893/64390760     TYPE [41]
                0.21    0.00 57807893/71107210     TENVR [54]
                0.00    0.01  551924/3639885     nextenv [113]

40 million of those calls came from __arc_getenv, to read environment variables. The current __arc_putenv/__arc_getenv functions are general and can obtain environments everywhere and do checks for parent environments, so perhaps the next optimisation we can try will be to make special case VM instructions for getting variables from a function’s own local environment (as opposed to its parent environments), and inline functions that do the same. We’ll embed those as inline functions in the virtual machine itself for greater efficiency. Reading and writing function local variables seems to be done quite a lot, and it should be something doable with the greatest efficiency, preferably without even the overhead of a function call.

~ by stormwyrm on 2013-05-15.

Posted in arcueid, devnotes

Arcueid 弧