random-state.net / SBCL Numeric Performance Today (November 16th 2011)

SBCL Numeric Performance Today #
hacking, November 16th 2011

I've been working on and off on a new microbenchmark tool — primarily for SBCL, but usable for other implementations as well. Last night I finally got around to teaching it how to generate pretty pictures, using Google Visualization API, and wrote a number of microbenchmarks that show the variety of numeric performance in SBCL.

Here it is: Performance of addition variants in SBCL 1.0.53.31 The higher the bar, the better the performance — the numbers are million iterations of the benchmark per second of run-time.

Each benchmark does a roughly comparable task: adds two numbers. What varies is what the types of the numbers are, and how much the compiler knows about the situation. (In some benchmarks there may be an extra comparison or so per iteration to keep the compiler from getting and flushing out the code as effectless.) There are basically four classes of performance:

Superb: Modular inline integer arithmetic. This is performance essentially identical with what you'd expect from C or ASM.
Good: Compiler knows the types, the argument types are inline-arithmetic-friendly, the result type is not in doubt (addition of two fixnums can be a bignum), and the function doing the addition is inlined at the site where the results are unboxed and the result is used.
Decent: Compiler knows the types, the types are inline-arithmetic-friendly and have an immediate representation, but the function doing the addition is out of line.
Bad Generic arithmetic on anything else but fixnums small enough for the result to be a fixnum is just not that great.

What should be of interest to anyone optimizing floating point performance is that type-checking doesn't really cost anything measurable most of the time. All of those benchmarks do full type typechecks except for double-unsafe-sans-result+, and the gain over the safe variant is minuscule.

What matters is that you generate inline arithmetic so that your floating points don't get boxed. On x86-64 SBCL has immediate single-floats, so occastional boxing isn't quite as disastrous (compare single+ and double+), but getting rid of the boxed representations completely is a huge win — just compare single+ to complex-double-inline+.

Postscript: I know not everyone reading this will be clear on unboxed, boxed, immediate, non-immediate, etc. My apologies. I will try to remedy the situation and write about the different representations and how and why they matter at a later date.

Post-Postscript: I will be publishing the benchmark tool once it settles down, and once I have a chance to test-drive it with something besides SBCL. Could be a while, though. If you urgently need it, get in tough and we'll arrange something.