More performance tuning
After doing a few more trials, I noticed that the GC quantum is set far too high, and we are using the full-blown TYPE function to decide whether something is a symbol or not, and the GC actually winds up running a lot, so I made a few minor changes that improved performance slightly. Reducing the GC quantum from 4096 objects checked per cycle to 768 had some effects (and for the final cut in the next formal release I think I’ll be setting the GC quantum by an adaptive algorithm similar to how the Inferno/Dis garbage collector does things). Profiling an ApacheBench of 100 requests on news.arc had the following rather interesting results:
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 13.04 2.34 2.34 9911 0.00 0.00 gc 7.69 3.72 1.38 57807893 0.00 0.00 envval 5.24 4.66 0.94 71028697 0.00 0.00 MARKPROP 4.40 5.45 0.79 64935366 0.00 0.00 nextcont 3.85 6.14 0.69 65820037 0.00 0.00 contenv 3.65 6.80 0.66 5852362 0.00 0.00 arc_hash_final 3.09 7.35 0.56 48743506 0.00 0.00 TYPE 2.90 7.87 0.52 139967108 0.00 0.00 TYPE 2.56 8.33 0.46 7125944 0.00 0.00 bibop_alloc 2.29 8.74 0.41 64390760 0.00 0.00 TYPE 2.23 9.14 0.40 1116526 0.00 0.00 __arc_update_cont_envs 1.95 9.49 0.35 13605816 0.00 0.00 arc_hash_update 1.95 9.84 0.35 71028617 0.00 0.00 __arc_wb 1.87 10.18 0.34 4605836 0.00 0.00 arc_restorecont 1.78 10.50 0.32 1371627 0.00 0.00 __arc_vmengine 1.62 10.79 0.29 6200889 0.00 0.00 __arc_mkenv 1.62 11.08 0.29 5534865 0.00 0.00 hash_lookup
The envval function, which now takes 1.38 seconds of the total run time and was called over 57 million times, is the function that obtains environment values from the stack. Looking further down into the call graph, the following is seen:
0.12 0.05 4891773/57807893 __arc_menv [39] 0.30 0.13 12475968/57807893 __arc_putenv [30] 0.97 0.41 40440152/57807893 __arc_getenv [12] [10] 11.0 1.38 0.59 57807893 envval [10] 0.37 0.00 57807893/64390760 TYPE [41] 0.21 0.00 57807893/71107210 TENVR [54] 0.00 0.01 551924/3639885 nextenv [113]
40 million of those calls came from __arc_getenv, to read environment variables. The current __arc_putenv/__arc_getenv functions are general and can obtain environments everywhere and do checks for parent environments, so perhaps the next optimisation we can try will be to make special case VM instructions for getting variables from a function’s own local environment (as opposed to its parent environments), and inline functions that do the same. We’ll embed those as inline functions in the virtual machine itself for greater efficiency. Reading and writing function local variables seems to be done quite a lot, and it should be something doable with the greatest efficiency, preferably without even the overhead of a function call.