random-state.net

Nikodemus Siivola

| top | previous >>

Efficient Doesn't Equal Performant #
hacking, March 16th 2013

Edit: Philipp Marek kindly pointed out that where I claimed to have 1M entries, I only had 100k. Oops! This has been corrected below, other numbers altered correspondingly, and everything rerun. My point still stands.

This is a bit of a rant, I guess. Sorry about that.

A few years back I needed a binary heap, and I needed one that was fast and thread safe. So I wrote Pileup.

There are other heaps for Common Lisp, and some of them support operations Pileup doesn't implement out of the box, and all of them claim to be efficient.

...and I'm sure that algorithmically they are. However, constant factors matter more often than you might think. I'll single out CL-HEAP primarily because it has such an authoritative name. :)

A tiny benchmark:

(defvar *100k-numbers* (loop repeat 100000
                             collect (random most-positive-fixnum)))

(defvar *400-numbers* (loop repeat 400
                            collect (random most-positive-fixnum)))

(defun insert-and-pop (isert pop heap things)
  (declare (function insert pop))
  (dolist (thing things)
    (funcall insert heap thing))
  (loop for thing = (funcall pop heap)
        while thing))

(defun make-insert-and-pop (make insert pop things)
  (declare (function make))
  (insert-and-pop insert pop (funcall make) things))

(defun time-insert-and-pop (insert pop heap things)
  ;; Time 4 runs.
  (time
   (loop repeat 4 do (insert-and-pop insert pop heap things)))
  t)

(defun time-make-insert-and-pop (make insert pop things)
  ;; Time 1000 runs.
  (time
   (loop repeat 1000 do (make-insert-and-pop make insert pop things)))
  t)

(defun cl-heap-make ()
  (make-instance 'cl-heap:priority-queue))

(defun cl-heap-insert (heap thing)
  (cl-heap:enqueue heap thing thing))

(defun cl-heap-pop (heap)
  (cl-heap:dequeue heap))

(defun pileup-make ()
  (pileup:make-heap #'<))

(defun pileup-insert  (heap thing)
  (pileup:heap-insert thing heap))

(defun pileup-pop (heap)
  (pileup:heap-pop heap))

;;; CL-HEAP: insert and pop 100k numbers x 4 into a single queue
(time-insert-and-pop #'cl-heap-insert
                     #'cl-heap-pop
                     (cl-heap-make)
                     *100k-numbers*)

;;; PILEUP: insert and pop 100k numbers x 4 into a single queue
(time-insert-and-pop #'pileup-insert
                     #'pileup-pop
                     (pileup-make)
                     *100k-numbers*)

;;; CL-HEAP: make 1k heaps, insert and pop 400 numbers into each
(time-make-insert-and-pop #'cl-heap-make
                          #'cl-heap-insert
                          #'cl-heap-pop
                          *400-numbers*)

;;; PILEUP: make 1k heaps, insert and pop 400 numbers into each
(time-make-insert-and-pop #'pileup-make
                          #'pileup-insert
                          #'pileup-pop
                          *400-numbers*)

Results: (warmup run, then median of three runs for each)

;;; CL-HEAP: insert and pop 100k numbers x 4 into a single queue
Evaluation took:
  6.038 seconds of real time
  5.999279 seconds of total run time (5.808912 user, 0.190367 system)
  [ Run times consist of 0.355 seconds GC time, and 5.645 seconds non-GC time. ]
  99.35% CPU
  15,660,756,020 processor cycles
  208,397,472 bytes consed

;;; PILEUP: insert and pop 100k numbers x 4 into a single queue
Evaluation took:
  0.430 seconds of real time
  0.431266 seconds of total run time (0.429962 user, 0.001304 system)
  100.23% CPU
  1,115,426,026 processor cycles
  3,053,536 bytes consed

;;; CL-HEAP: make 1k heaps, insert and pop 4k numbers
Evaluation took:
  2.830 seconds of real time
  2.839067 seconds of total run time (2.829425 user, 0.009642 system)
  [ Run times consist of 0.031 seconds GC time, and 2.809 seconds non-GC time. ]
  100.32% CPU
  7,338,649,539 processor cycles
  182,811,168 bytes consed

;;; PILEUP: make 1k heaps, insert and pop 4k numbers
Evaluation took:
  0.301 seconds of real time
  0.301773 seconds of total run time (0.300767 user, 0.001006 system)
  100.33% CPU
  779,991,108 processor cycles
  12,540,864 bytes consed

(I was also going to compare parallel performance, but CL-HEAP doesn't appear to be thread-safe, so...)

This is not to disparage CL-HEAP: it supports things which Pileup doesn't, but it clearly isn't written with constant factors in mind, and this shows.

Constant factors matter.

(Admittedly, I tested this only on SBCL, and it might turn out that CL-HEAP does a lot better — and Pileup a lot worse — on some other implementation. This does not alter my main contention that you ignore constant factors at your own peril.)