[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Compilation of methods per class.

There definitely may be something to the differences in method lookup between
different ports of PCL being a partial explanation to the problem.
Calls on generic functions in Franz CL on a Sun 4 seem much
less likely to break (i.e. slow down) when those functions are
specialized to multiple classes than their equivalent calls in Lucid 3.0
on the Apollos and Sun 4s.

Here is about as small a benchmark I can make to show the problem significantly:


(defclass everybody-inherit-class () ())

(defclass primary-class1 (everybody-inherit-class) ())
(defclass primary-class2 (everybody-inherit-class) ())
(defclass primary-class3 (everybody-inherit-class) ())
(defclass primary-class4 (everybody-inherit-class) ())

(defclass around-class1  () ())
(defclass around-class2  () ())
(defclass around-class3  () ())
(defclass around-class4  () ())

(defclass class1  (primary-class1) ())
(defclass class2  (primary-class2) ())
(defclass class3  (primary-class3) ())
(defclass class4  (primary-class4) ())
(defclass class11 (around-class1 primary-class1) ())
(defclass class22 (around-class2 primary-class2) ())
(defclass class33 (around-class3 primary-class3) ())
(defclass class44 (around-class4 primary-class4) ())

(defmethod really-fast-foo ((self everybody-inherit-class))
  ;; Method not defined on any other classes.

(defmethod fast-foo ((self everybody-inherit-class))
  ;; Method with one primary and one :around method defined.

(defmethod slow-foo ((self everybody-inherit-class))
  ;; Method with primary defined separately on 5 classes and
  ;; :around on 4 classes.

(defmethod slow-foo ((self primary-class1))

(defmethod slow-foo ((self primary-class2))

(defmethod slow-foo ((self primary-class3))

(defmethod slow-foo ((self primary-class4))

(defmethod fast-foo :around ((self around-class1))
  (cons 'fast-foo-around (call-next-method)))

(defmethod slow-foo :around ((self around-class1))
  (cons 'slow-foo-around1 (call-next-method)))

(defmethod slow-foo :around ((self around-class2))
  (cons 'slow-foo-around1 (call-next-method)))

(defmethod slow-foo :around ((self around-class3))
  (cons 'slow-foo-around1 (call-next-method)))

(defmethod slow-foo :around ((self around-class4))
  (cons 'slow-foo-around1 (call-next-method)))

(defvar i1  (make-instance 'class1))
(defvar i2  (make-instance 'class2))
(defvar i3  (make-instance 'class3))
(defvar i4  (make-instance 'class4))
(defvar i11 (make-instance 'class11))
(defvar i22 (make-instance 'class22))
(defvar i33 (make-instance 'class33))
(defvar i44 (make-instance 'class44))


The following is the (gruesome) benchmark macro that I have built to
measure run times as accurately as possible across platforms.  I apologize
for its length, but timing differences such as these are hard to tease out
(because of the need to average over a large number of calls and problems with
inconsistent interruptions such as garbage collections during a large
number of calls).


(defmacro time-test (&rest args)
  ;;   Macro which attempts to accurately measure the amount of real time
  ;; (in microseconds) that expressions take to run when compiled.
  ;;   Example call:  (time-test (sin 3.2) (princ-to-string (sin 3.2)))
  ;;   To do this, each of the expressions being tested is used as the body of
  ;; a separate function having a null lambda list (named #'expr1, #'expr2, etc)
  ;; which is then compiled.  To calculate the time taken, the function is
  ;; essentially cycled Iter of times, with the average time taken by each
  ;; call of the expression being equal to the total time to iterate the
  ;; function it is the body of minus the total time taken to iterate (for the
  ;; same number of cycles) an empty function (#'empty-lambda).
  ;;   Because the real time taken for each function call will vary
  ;; depending upon other processes running on the machine, garbage
  ;; collections, and the state of virtual memory, the iterations are
  ;; broken up into a number of different Epochs (defaults to 10, but can be
  ;; passed as a key) for which Iter iterages are done and the average time
  ;; per function call taken for each.  At the end of all of the epochs,
  ;; the average amounts of time taken for each expression are printed out,
  ;; first overall, and then successively with the epochs whose average time
  ;; deviates most from the mean average time removed.  Thus the final line
  ;; printed tends to show the average amount of time taken for each expression
  ;; function calls, minus expression calls on which garbage collections, etc.
  ;; were done, and therefore (hopefully) close to the "true" real time taken
  ;; by the expression.
  ;;   E.g.
  ;;   > (time-test (sin 3.2) (princ-to-string 3.2) :epochs 6)
  ;;   Epochs 6, iterations 150...
  ;;      6      2,684    720,555     31,890
  ;;      5      2,004    732,108     29,970
  ;;      4      3,077    727,762     29,412
  ;;      3      2,777    719,136     29,601
  ;;      2      2,618 19,049,310     29,426
  ;;      1      3,480    721,983     28,989
  ;;           0.07396  100.75935    0.79684
  ;;           0.07806   19.31490    0.78612    5 best
  ;;           0.07437   19.26291    0.78939    4 best
  ;;   In the above, the first six lines are each of the epochs, with the
  ;; numbers representing total time (in internal clock units) taken to do
  ;; 150 iterations minus the time for 150 iterations of the #'empty-lambda.
  ;; The last column is the time taken for 150 iterations of #'empty-lambda.
  ;; E.g. on the first epoch (#6), 150 (sin 3.2) took 2,684 clock units and
  ;; (princ-to-string 3.2) took 720,555.  Notice that on the second to last
  ;; epoch a garbage collection happened during the princ-to-string iterations.
  ;; The final lines shows the average time, in microseconds, for each call
  ;; of expression, with the last line showing the average over the 4 best
  ;; (i.e. closest to mean average time) epochs.
  ;;   NOTE:  The fewer other processes running on the machine at the same
  ;; time (even typing), the more accurate a measure this macro will return.
    ((expressions (before-keyword args))      ; List of expressions to compare
     (expr-fns NIL))                          ; Names of expression functions
     ;; Build functions (empty-lambda), (expr1), (expr2), ...
     (defun empty-lambda () NIL)
     ,@(let ((expr-num 0)
             (expr-fns-defun-code NIL))
         (dolist (expr expressions)
           (incf expr-num)
           (push (read-from-string
                   (concatenate 'string "EXPR" (princ-to-string expr-num)))
           (push `(defun ,(car expr-fns) () NIL ,expr)
         (push 'empty-lambda expr-fns)
     ;; Compile functions (empty-lambda), (expr1), (expr2), ...
     ,@(let ((code NIL))
         (dolist (expr-fn expr-fns)
            (push `(unless (compiled-function-p #',expr-fn)
                     (compile ',expr-fn))
     (let ((start-time nil)
           (end-time nil)
           (empty-lambda-time nil)
           (iter   ,(key-value :iter args))     ; Iterations per comparison
           (epochs ,(key-value :epochs args))   ; Number of separate comparisons
           (time   ,(or (key-value :time args)  ; Desired total time comparing
                        (* (length expressions) 10))))
       ;; Make sure expressions are in core memory.
       ,@(let ((code NIL))
           (dolist (expr-fn expr-fns)
             (push `(,expr-fn) code))
       ;; Estimate time for one iteration through all functions.
       (setf start-time (get-internal-real-time))
       (dotimes (i 10)
         ,@(let ((code NIL))
             (dolist (expr-fn expr-fns)
               (push `(,expr-fn) code))
       (setf end-time (get-internal-real-time))
       (setf empty-lambda-time (/ (/ (- end-time start-time) 10)
       ;; Calculate iter and/or epochs from this (if not provided).
       (cond ((null iter)
              (unless epochs
                (setf epochs 10))
              (let* ((exact-iter   (/ (/ time empty-lambda-time) epochs))
                     (iter-divider (expt 10 (1- (round (log exact-iter 10))))))
                (setf iter (* iter-divider
			      (ceiling (/ exact-iter iter-divider))))))
             ((null epochs)
              (setf epochs (round (/ (/ time empty-lambda-time) iter)))))
       (format t "~%Epochs ~A, iterations ~A...~%" epochs iter)
       ,@(let ((code NIL))
           (dolist (expr-fn expr-fns)
             (push `(setf (get ',expr-fn 'epoch-times) NIL) code))
       (dotimes (i epochs)
         ;; Calculate one epoch's time for empty-lambda function call.
         (setf start-time (get-internal-real-time))
         (dotimes (i iter)
         (setf end-time (get-internal-real-time))
         (setf empty-lambda-time (- end-time start-time))
         (push empty-lambda-time (get 'empty-lambda 'epoch-times))
         (format t "~4D" (- epochs i))
         ;; Calculate and display one epoch's time for each expr.
         ,@(let ((code NIL))
             (dolist (expr-fn expr-fns)
               (unless (eq expr-fn 'empty-lambda)
                 (push `(progn
                          (setf start-time (get-internal-real-time))
                          (do ((i 0 (1+ i))) ((>= i iter))
                          (setf end-time (get-internal-real-time))
                          (format t "~10:D" (- (- end-time start-time)
                          (push (- (- end-time start-time) empty-lambda-time)
                                (get ',expr-fn 'epoch-times)))
         (format t "~10:D~%" empty-lambda-time))
       ;; Display average times (in milliseconds) for each expression,
       ;;   removing the worst deviator each time, until only the 50%
       ;;   closest to the mean have their average displayed.
       (dotimes (num-remove (round (/ epochs 2)))
         (format t "    ")
         ,@(let ((code NIL))
             (dolist (expr-fn expr-fns)
                 `(format t "~10,5F"
                    (/ (/ (list-mean
                            (setf (get ',expr-fn 'epoch-times)
                                    (get ',expr-fn 'epoch-times)
                                    :number (if (eq num-remove 0)
                      (/ internal-time-units-per-second 1000)))
         (if (eq num-remove 0)
             (format t "~%")
           (format t "~5D best~%" (- epochs num-remove))))))))

(defun before-keyword (list)
  ;;  Return all of the elements in the list before the first keyword.
  (do ((elements list (cdr elements))
       (before-list NIL))
      ((or (null elements) (keywordp (car elements)))
       (reverse before-list))
      (setf before-list (cons (car elements) before-list))))
(defun key-value (key
                  (default-value NIL))
  ;;   If key is on lambda-list, then its value is returned, otherwise
  ;; default-value is returned.
  (do ((elements lambda-list (cdr elements)))
      ((or (null elements)
           (eq (car elements) key))
       (if elements (cadr elements) default-value))))

(defun list-mean (list &key (key #'identity))
  "Returns the mean of the list."
  (float (/ (let ((sum 0))
              (dolist (element list)
                (setf sum (+ sum (funcall key element))))
            (length list))))

(defun remove-mean-deviators (list 
                              (number NIL)
                              (percentage NIL)
                              (key #'identity))
  "Returns list with the elements which deviate the most from the list's
   mean removed.  Number removed equals either number or percentage of
   the list's length."
  (unless number
    (setf number
          (if percentage
              (floor (* (length list) percentage))
  (cond ((<= number 0) list)
        (t (remove-mean-deviators
             (do ((elements  list (cdr elements))
                  (pos       0    (1+ pos))
                  (list-mean (list-mean list :key key))
                  (worst-pos 0)
                  (worst-deviation 0)
                  (current-deviation NIL))
                 ((null elements)
                  (remove-nth worst-pos list))
               (when (> (setf current-deviation
                              (abs (- (funcall key (car elements)) list-mean)))
                 (setf worst-deviation current-deviation)
                 (setf worst-pos pos)))
             :number (1- number)
             :key key))))

(defun remove-nth (n list)
  ;;  Return a list that is list with its nth element removed.
  (let ((nth (nthcdr n list)))
    (append (ldiff list nth) (cdr nth))))


The result in Lucid on the Sun 4:

> (time-test (really-fast-foo i1)
             (fast-foo i1)
             (slow-foo i1)
             (slow-foo i2)
             (slow-foo i3)
             (slow-foo i4)
             (fast-foo i11)
             (slow-foo i11)
             (slow-foo i22)
             (slow-foo i33)
             (slow-foo i44))

Epochs 10, iterations 6000...
       0.01069   0.01916   0.02417   0.02933   0.02427   0.04161
       0.10385   0.10587   0.12332   0.11860   0.09630

Here the time-test benchmark does 10 "epochs", with each epoch doing first
6000 iterations of (really-fast-foo i1), then 6000 iterations of
(fast-foo i1), etc.  The final numbers are the averaged time (in msec)
for each function call over all epochs.

As you can see, a call of the generic function #'really-fast-foo, only defined
as a primary on everybody-inherit-class, here takes ~0.011 msec.  A call of
(fast-foo i1), which is also specialized as an :around on around-class1,
takes nearly twice as long (0.019 msec), even though i1 doesn't
inherit that :around.  The #'slow-foo generic function calls, which are
specialized on multiple different classes, take anywhere from 2-4 times
as long (0.024 - 0.041 msec).  The differences aren't quite as large
on the :around-using classes (i11, i22, i33, and i44), but it is the
calling of these methods that causes the slowdowns on the active
primary-only using methods.  Slowdowns get even more dramatic for
generic functions having larger numbers of different specialized 
methods that are "mixed and matched" more normally than these.

I couldn't illustrate any significant differences in run times with
this test file in Franz, though I do get them (up to 10 times slowdown,
as I do in Lucid) with my large program that has some generic-functions
specialized on over 20 different classes with a number of variable mixins
on run-time classes.

Another question brought up by all this is why generic function calls on
classes that have :arounds with one call-next-method take so much longer
than calls on classes that only have a primary?  E.g. fast-foo on class11
here takes 5x as long (3x as long in Franz) as fast-foo on class1 (0.104 msec
vs 0.019 msec).  Isn't all of the "hard" work done in looking up the
effective method for a class in the first place? If the effective methods
were computed and stored for each run-time class, then a call-next-method
should (basically) involve primarily an additional funcall of a (known)
next method function, which should be trivial in comparison to the
effective-method-function lookup done by the generic function.

Hopefully this will help you see what is going on.

Thanks again for looking into these questions,

- Trent Lange