[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Compilation of methods per class.



The basic PCL architecture does something very close to what (I believe)
you are suggesting.  Whether a given port of PCL actually produces code
of the same quality as your example would compile to depends on a number
of things.  First among these is whether the fast %FUNCALL and %APPLY
instructions were properly ported.

I am not saying that PCL produces exactly the same code, its code looks
more like:

(defun foo-runtime12 (self)
  (let ((*next-methods* '#,(list #'foo-class1-lambda self)))
    (foo-runtime12-inner self)))

(defun foo-runtime12-inner (self)
  (declare (special *next-methods*))
  (let ((next-method (car *next-methods*)))
    (foo-class2-body-begin)
    (%funcall #'next-method self)        ;You should read this as a funcall
                                         ;that needs no error-checking. The
                                         ;number of arguments are sure to be
                                         ;right.
    (foo-class2-body-end)))

As compared to your version:   
   (defun foo-runtime12-lambda (self)
     (foo-class2-body-begin)
     (funcall #'foo-class1-lambda self)
     (foo-class2-body-end))

PCL does the following extra work:

 % 1 tail-recursive function call (foo-runtime-12 to foo-runtime-12-inner)
 % special variable binding
 % special variable access
 % special variable unbinding
 * unsafe CAR (memory read)

Admittedly, that is a reasonable amount of extra work but it is
important to note that the operations marked with "%" instead of "*"
wouldn't have to be done in a better port of the PCL architecture.  That
is, if when rewriting PCL you are allowed to talk to the innards of the
compiler and agree about a special place to pass the next methods, you
can just pass these in a register, and the tail-recursive fn call can
really be a jump.  You would still have to do the unsafe CAR though, and
the jump would, in all likelyhood, be out-of-line.

One other point is that the technique PCL is using avoids the problem of
increased code size you point out.  In general, the PCL architecture
tries pretty hard to decrease the amount of runtime code because back
when we first started thinking about it machines had modest sized
instruction caches.

So, I guess what I am saying is that if you are using PCL, you may not
get a tremendous amount of boost out of the optimization you show.  It
depends on what port you are using.

But, there are some related optimizations that may really win for you.
For example, consider half-a-dozen classes (c1 - c6), and some generic
functions (g1-g3).  Suppose that each generic function accepts only one
argument, and that each generic function has a method specialized to
each class.  Suppose moreover, that g1 is the real external entrypoint,
and that all methods on g1 must call g2 and g3.

(defclass c1 () ())       (defmethod g1 ((x c1))
(defclass c2 () ())         (something-c1 (g2 x) (g3 x)))
.                
.                         (defmethod g1 ((x c2))
                            (something-c2 (g2 x) (g3 x)))

(defmethod g2 ((x c1))    (defmethod g3 ((x c1))
  ..)                       ..)
(defmethod g2 ((x c2))    (defmethod g3 ((x c2))
  ..)                       ..)
 .                          .
 .                          .

The point being that once you are inside a method on g1, you can know
what method on g2 and g3 you will end up calling.  Then, it is worth
rewriting methods for g1, that have direct calls to the appropriate
methods for g2 and g3.

PCL doesn't do this, I wish it did.  The best work on doing stuff like
this is, to my knowledge, the SELF work done at Stanford.

One final note, when thinking about CLOS implementation, I have found it
easier to keep my thoughts straight if I avoid the use of phrases like
"each class would have its own code for each method."  Thinking of
methods as being associated with classes seems to make it difficult to
generalize to multi-methods.