[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: associative methods



> Date: Sun, 4 Oct 1992 19:55:21 +0100
> From: Jonathan Bachrach <Jonathan.Bachrach@ircam.fr>
> 
> In Dylan there are a number of method's that take an unlimited number
> of arguments and convert to `binary' methods (e.g., = translates to
> binary=).  Unfortunately, there is no way for a user to define their
> own and I'm positive that users are going to want their own.  Sure,
> one can simulate this conversion, but it is inefficient (i.e., conses
> #rest list on heap and is performed at run-time).
> 
> One way to make this facility available to users would be to introduce
> a new construct such as `define-associative-method' for this facility.
> 
> Date: Mon, 05 Oct 92 01:56:37 -0400
> From: Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU
> 
> In my opinion, your proposed solution is too specialized and doesn't get to
> the heart of the problem.  The real problem is that the #rest machinery is
> inherently expensive to use, since it is defined as creating a sequence at
> runtime.  The proverbial SSC ("sufficiently smart compiler") could prove in
> many cases that the sequence is confined within the method's dynamic extent
> and can therefore be stack-allocated, but I think that users will become
> confused and anxious about which cases heap-allocate and which don't.
> 
> For cases like + and =, in which an unlimited number of arguments are
> consumed locally, I like the idea of providing some mechanism like the old
> Maclisp LEXPRs.

I don't think making small improvements in the rest argument mechanism is a 
good way to tackle this problem.  Much better is to compile out all of the rest argument processing and remove the extra level of function call entirely.

This can be done without going outside the existing Dylan language, so long as 
there is a way for the compiler to know which functions are safe to compile 
inline.  For example (using the Common Lisp way to tell that to the compiler):

  (declaim (inline +))

  (define + (method (#rest numbers)
              (if (empty? numbers) 0 (reduce1 binary+ numbers))))

If the built-in generic function reduce1 is sufficient sealed that the compiler knows what it does with those arguments, that's enough information for the compiler to convert calls to + into calls to binary+, with no runtime overhead.  In (+ a b c), it knows numbers is a 3-element sequence of the values of a, b, and c, and it inlines the if, empty?, and reduce1, yielding

  (binary+ (binary+ a b) c)

And of course this technique works just as well for functions defined by the user that are analogous to + and binary+.  This is not an exotic or difficult optimization.

= to binary= is a little trickier since the necessary higher-order function, a 
cross between reduce1 and every?, is not already built into the language.  
However the same principle applies, the higher-order function just has to be 
defined and declaimed inline.  It would go something like this (simplified to work only for lists instead of general sequences, for ease of exposition):

  (declaim (inline pairwise-every?))

  (define pairwise-every?
          (method (function list)
            (bind-methods ((loop (item list)
                             (cond ((empty? list) #t)
                                   ((function item (car list))
                                    (loop (car list) (cdr list)))
                                   (else: #f))))
              (loop (car list) (cdr list)))))

  (declaim (inline =))

  (define = (method (#rest objects) (pairwise-every? binary= objects)))

What about associativity (only mentioned in the subject line!)?  Well, that's a property of binary+ and binary=, not of + and =, and allows further optimization of the inlining.  For example it would allow the transformation

    (binary+ (binary+ a 2) 5) -> (binary+ a (binary+ 2 5)) -> (binary+ a 7)

However, the Dylan book does not say binary+ and binary= are associative, so there might be non-associative user-defined methods and that transformation might not be correct.  But I guess associativity wasn't the real subject of this discussion anyway.

> For example, the arglist might contain "#MORE N", which
> would bind n to the number of arguments that were passed.  Then the method
> can iterate, using some form like (MORE-ARG i) to access each argument i.
> In this case, there would be no runtime consing.  #MORE might replace
> #REST (with the addition of an ARGS-TO-LIST function), or the two could
> coexist as alternative arg-list forms.

To my way of thinking, this partial proposal is just a strange variation on the existing sequence primitives.  more-arg is element except it gets its first argument from a hidden variable.  Instead of adding to the language such a duplicate mechanism, we would be better off all around making the existing mechanism work usefully.  I don't think it's at all difficult for a compiler to detect that the only use of a #rest parameter is as the first argument to the functions size and element, and compile it the same way that your proposal would compile.  In fact we don't need to be that restrictive, we can certainly also allow it to be the last argument to apply and still not materialize a representation of it as one of the standard sequence classes.  Even if these compiler optimizations were difficult, and they're not, I would still think it is a good tradeoff to make compiler implementation more difficult to make use and understanding of the language less difficult.

> I think that users will become
> confused and anxious about which cases heap-allocate and which don't.

Scott, I think the argument you are really making here is an argument against abstraction.  You are saying that any language that is more abstract than C will cause users to become confused and anxious, because they can't see from the source code exactly what the compiled code is going to be.  There is certainly some evidence for this position, but if it's really true, we may as well give up and go home because there is no hope for us.

I would rather believe that properly implemented and supported abstract languages can be successful and deliver real value to the users.  Getting back to optimization of #rest parameters, this means the compiler has to be accountable and transparent (I think your Python compiler at CMU has taken some giant steps in this direction), and surely it also means that a decent development environment must include good performance analysis tools so users can find the real bottlenecks in their program, instead of relying on superstitions such as "rest arguments are slow" which develop in any user community where the implementation is not accountable and there aren't good tools.

To conclude this overlong message:

So instead of adding warts to the language, let's talk about whether we need a 
standardized way to control inlining, or whether sealing takes care of that, 
or whether it should be an implementation-dependent development environment 
feature, or whether the compiler can do it on its own without any additional 
input from the programmer.  After we deal with that, let's talk about ideas for the programmer's interface to the compiler that make the compiler's optimizations transparent and accountable.