random-state.net / Array Improvements (May 18th 2009)

Array Improvements #
hacking, May 18th 2009

I've just about completed a slew of array related optimizations in SBCL. Since the details are not all that interesting, I'll focus on couple of my favorite changes:

MAKE-ARRAY already had some compiler transforms that were able to take care of some aspects of it, but there were some IMO perfectly idiomatic cases that were dog slow:

;; the compiler didn't realize it could elide the call to LIST and use
;; it's arguments directly to initialize the array — ditto for VECTOR
;; and backquoted forms in place of LIST.
(defun bar (x y z)
  (make-array 3 :initial-contents (list x y z)))

Previously this resulted in consing up the list at runtime, and then making a full call to MAKE-ARRAY, including keyword argument parsing and everything. I'll spare you the disassembly — pick up an pre 1.0.28.51 SBCL and see for yourself if you must.

Now, we are instead able to transform the above into something like this:

(lambda (x y z)
  (initialize-vector (allocate-vector ...) x y z))

..which in turn is open coded, resulting in an almost 90% speedup. If you were aware of this deficiency, and used explicit (SETF AREF) to initialize the array, you were manually doing what the compiler now does automatically — and will not see any performance improvements.

What's more, now we are also able to stack allocate arrays in the presence of :INITIAL-CONTENTS or :INITIAL-ELEMENT, including those allocated with a call to VECTOR.

FILL also now benefits from a better transform. If the array element type is known (and fills a bunch of criteria), we are able to fill much more efficiently by using "bit bashing". The implementation functions for this have existed in SBCL for quite a while now, thanks to Nathan Froyd, but were not used until now.

Consider eg. an array with element type (UNSIGNED-BYTE 8). By first filling a word with 4 (8 on 64 bit platforms) copies of the initial element, we can fill the array a word at a time — poor man's SIMD, if you will. This strategy was already used for filling bit vectors, but now it applies to all types where it is safe.

Upshot: upto 90% faster FILL.