[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: remove



Espen is correct in that Hallvard's REMOVE variant isn't
tail-recursive and may therefore cause problems with long lists.  One
could code an iterative tail-preserving version (which might be slower).

But I did some benchmarking on Sun-4s under Allegro and CMU CL, and even
on longish lists (like 1000 elements or so), Hallvard's FASTER-REMOVE
performed better on the whole than the system-provided REMOVE versions.
I would bet that at least 90% of the uses of REMOVE would benefit from
using FASTER-REMOVE.

Let me just point out that is a bit unfair to compare FASTER-REMOVE
against CL REMOVE implementations because REMOVE allows :TEST/:TEST-NOT
and :KEY arguments to be provided, which makes it more difficult to
inline the test - FASTER-REMOVE uses EQ directly, which is usually
inlined.  REMOVE also has to cope with :START, :END, :COUNT and
:FROM-END arguments, although these are usually not provided.

On the other hand, the implementors probably thought that REMOVE would s
usually be called only in casewhere it would actually remove
something, so sharing structure with the original sequence (list)
wouldn't be worth the effort.  But since REMOVE often leaves a
significant tail portion intact, this assumption could have been wrong.

The moral is probably:

0. Don't use any CL functions other than EQ, ATOM, CAR, CDR, CONS(?),
   defstruct accessors and the fixnum and float operations in
   time-critical code :-)

1. Even for basic functions like REMOVE, it can be a good idea to define
   your own specialized versions that are tuned for application-specific
   performance.  Common Lisp's abundance of useful `generic' (not in the
   CLOS sense) functions is in part responsible for the reputation of CL
   as being `slow': If I want to remove an item from a list structure in
   C, I usually write the loop myself, with the appropriate testing
   inline.  Usually I will also write it to work destructively.  In CL I
   simply call REMOVE---this is guaranteed to work but is also guaranteed
   to scan the whole list (unless I say :COUNT which is often possible),
   and usually reconses the whole list even if nothing or just the first
   or second element has to be removed.

3. There is still some optimization potential for Lisp implementations.
   Consider REMOVE: in 90% of all cases it is called with the second
   argument being a list(*), with no :START/:END, :COUNT (and therefore
   no :FROM-END) and the :TEST argument being either the default (#'EQL)
   or #'EQ.  So by transforming these two cases into calls to specialized
   functions like SIMPLE-LIST-EQ-REMOVE and SIMPLE-LIST-EQL-REMOVE, a lot
   could be gained.  I was a bit surprised that I managed neither Allegro
   nor CMU CL to inline any call to REMOVE (well in CMU CL I finally
   defined the transformer), even in presence of a LIST type definition
   of the sequence argument.

   The sad thing about these possible optimizations is that they have a
   cost in terms of overall size of the system and the corresponding
   locality problems.  So the implementor has to find a good compromise
   between the extreme solutions (on one end of the spectrum: provide one
   single generic REMOVE function that handles lists, vectors, :TEST,
   :KEY, :COUNT, :FROM-END, and on the other end: provide inline dispatch
   to SIMPLE-LIST-EQ-REMOVE, SIMPLE-VECTOR-EQ-REMOVE,
   SIMPLE-LIST-TEST-NO-KEY-REMOVE, SIMPLE-LIST-EQ-COUNT=1-REMOVE,
   SIMPLE-LIST-EQL-COUNT=1-REMOVE,LIST-TEST-KEY-WITH-COUNT-FROM-END-REMOVE
   and so on, the same for DELETE, FIND, REMOVE-IF, DELETE-IF etc.).
   Clever "lazy" compilation schemes could help here.
-- 
Simon.

*) This is usually difficult for the compiler to prove.