[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Issue "Eliminate forced consing" re-visited



Wow, what a yawn.  (In response to my original proposal.)  After waiting a
month for comments, flames, whatever, I decided to re-send an edited version
of my original proposal in an attempt to generate some discussion.  Is anyone
else trying to compete with C programs for efficiency?  Here it is:
=======================
Issue:         ELIMINATE FORCED CONSING

References:    CLtL section 14.1,3,5; 15.2,5; 17.4; 18.3; 21.2

Category:      ADDITION

Edit history:  Version 1, 20-Jul-88, Ginder
               Version 2, 22-Aug-88, Ginder
                   Moved discussion to the discussion section.
                   Changed multiple value proposal, clarified
                   certain points.

Problem description:

Some sequence, list, and string functions in CLtL encourage a
programming style that uses excessive storage allocation compared to
libraries of routines with similar functionality in other languages,
notably C.  The only options available to the Common Lisp programmer who
uses these functions are to generate a newly-allocated sequence or to
destructively modify the argument sequence(s) given the function.  The
option of providing a sequence, list, or string into which the result of
a sequence operation should be placed is not available.

Proposal:

Add a :TARGET keyword argument to those sequence, list, and string
functions where such an argument is useful, as specified below, which
allows passing a target argument sequence into which the result of the
sequence computation is to be placed.  The sequence function returns the
target sequence modified as specified below:

(1) The target sequence must accomodate elements of the type(s) in the
source sequence.  Thus it would be an error to give a target string
argument if the source sequence had elements that were not of type
STRING-CHAR.

(2a) A non-list target sequence should have an allocated length long
enough to accomodate the normal result of the sequence function.  It is
permissable for the target sequence to have a fill pointer; in this
case, the fill pointer is set to point immediately after the last
element filled in by the sequence computation.  If the target sequence
is longer than necessary, the unneeded trailing elements are unchanged.

(2b) A list target sequence argument is extended with new conses to be
as long as necessary to accomodate the resulting sequence, if not enough
conses are supplied and the :TARGET-FROM-END keyword is nil.  The last
cons of the target list whose CAR was filled by the computation is
returned as a second value.  The CDR of this cons is that tail of the
target list not used to accomodate the sequence computation.  (I.e., The
unused conses remain as a tail of the target list that is returned.)
This permits the programmer to save the unused conses and then
null-terminate the head of the list used for the result.

(3) A :TARGET-FROM-END keyword is supported.  If non-nil, the target
sequence is filled with new elements starting at the end of the target
seqeunce, effectively reversing the order of elements of the resulting
sequence in the target.  In this case, an error is signalled if the
target sequence is not long enough for the result.  If the target
sequence is longer than necessary, leading elements are unchanged.

(4) :TARGET-START and :TARGET-END keywords are supported.  TARGET-START
and TARGET-END determine where in the target sequence the result of the
sequence computation is placed.  An error is signalled if the sub-length
of the target sequence specified by these arguments is not long enough
to accomodate the resulting sequence computation.  If a longer than
necessary sub-length is specified, then the elements in the unneeded
part of the specified sub-length are unchanged.

Changed sequence functions:
	subseq, copy-seq, reverse, remove, remove-if, remove-if-not,
	remove-duplicates, substitute, substitute-if, substitute-if-not,
	merge

Changed list functions:
	copy-list, butlast

	copy-alist, copy-tree, adjoin, union, intersection, set-difference,
	set-exclusive-or
 	  [no TARGET-START/END or TARGET-FROM-END, just destructive
	   use of the :TARGET conses]

Changed string functions:
	string-trim, string-left-trim, string-right-trim, string-upcase,
	string-downcase, string-capitalize 

Examples:

(copy-seq '(1 2 3) :target '(a b c d e f g))
	=> (1 2 3 d e f g)

(copy-seq '(1 2 3) :target '(a b c d e f g) :target-from-end t :end 5)
	=> (a b 3 2 1 f g)

(remove 'a '(b b a b b a b b a b b) :count 1 :target '(3))
	=> (b b b b a b b a b b)   ; EQ to :TARGET arg, CDR is new conses.

;;; Note, the :TARGET arg has a fill pointer.
(substitute #\a #\b "This is a string with 2 a's"
	    :target "0123456789012345678901234567890123456")
	=> "This is b string with 2 b's"

In a related addition, included here since it addresses the same problem
as the one above, provide extended versions of concatenate, append,
revappend, and make-string-output-stream as follows:

	concatenate-into target &rest sequences
	  Like CONCATENATE, but the result type is determined to be
	  the type of TARGET.  The result is TARGET containing as many
	  of the elements of SEQUENCES as can be accomodated by the
	  allocated length of TARGET.  TARGET's fill pointer, if
	  present is set according to how many elements of TARGET ar
	  filled by this operation.

	concatenate-into-subseq target start end &rest sequences
	  Like concatenate-into, but copied from SEQUENCES into	the
	  sub-sequence of TARGET specified by START and END.

	append-into target &rest lists
	  Like APPEND, but the copied list arguments are copied into
	  conses taken from TARGET.  The last list in LISTS
	  is not copied, as in APPEND, rather, the last cons used
	  from TARGET is given the last list in LISTS as its cdr.
	  The result is EQ to TARGET (unless a single list is appended),
	  but contains only those conses needed to hold the appended
	  lists' elements.  The tail of unused conses from TARGET is
	  returned as a second value; new conses are allocated if
	  TARGET supplies an insufficient number of conses.

	revappend-into target x y
	  Like REVAPPEND, but the new conses are taken from TARGET.
	  The result is EQ to TARGET, but contains only those conses
	  needed to hold X's elements.  The tail of unused conses
	  from TARGET is returned as a second value; new conses are
	  allocated if TARGET supplies an insufficient number of conses.

	make-string-output-stream &optional string
	  Like the current MAKE-STRING-OUTPUT-STREAM, except if STRING
	  is provided, make the string-stream write to it rather than
	  a newly-allocated string.  An error is signalled if the output
	  overflows the supplied string.

Rationale:

It is sometimes better to use a more complex programming construct for
sake of efficiency than to use a less complex, more expensive one.
Providing a set of sequence, list, and string functions that do not
require dynamic storage allocation provides a means for writing programs
that must avoid storage allocation while running as much as possible.
Excessive storage allocation (sometimes expressed as "Lisp garbage
collects too often") is one of Lisp's most widely publicized and least
understood faults.

Current practice:

Similar functionality is provided for bit-vectors, as specified in CLtL
17.4.  A related capability is provided in the destructive versions of
some of the functions extended above (e.g., REMOVE vs. DELETE).

When functionality similar to this is required, users must "roll their
own".

Cost to Implementors:

The cost of implementation of these extra sequence, list, and string
function arguments is significant, in that existing code to implement
these functions must be changed to use storage passed in as an argument
by the user.

Cost to Users:

No cost, this change upwards compatible.

Cost of non-adoption:

Some programmers will continue to "roll their own" storage re-using
code.  Others will not go to this effort, but will write programs that
require much garbage collection.  Lisp will continue to suffer from a
handicap in promoting efficient programs when compared to C.

Benefits:

Those programmers choosing to invest the extra effort needed to
correctly use such facilities can do so with less overhead than
previously required, and will see performance improvement.  Programmers
learning Common Lisp will be challenged to understand the difference
between allocating new storage and re-using old storage as they
understand why these arguments are supported.

Esthetics:

Neutral.  This proposal is syntactically consistent with the existing
keyword arguments to sequence, list, and string functions.  However, it
adds to the plethora of keyword arguments already present for many of
these functions.  Stylistically, it provides stronger support for the
programming style embodied by the use of the "destructive" versions of
many Common Lisp functions.

Discussion:

My experience in several commercial Lisp applications is that avoiding
storage allocation by using programming techniques such as this may be
critical to the success of a project.  The current sequence, list, and
string functions encourage an "expensive", albeit easy to use,
programming style that invites the creation of programs whose
performance suffers greatly in comparison to a C program written to
solve the same problem.  This applies particularly to string-hcaking
programs written using Common Lisp versus those written using the
standard string library for C.

I submit that the cost of NOT implementing this sort of functionality is
that Common Lisp, in many cases, deserves the reputation it is gaining
of being expensive to use, not in programmer productivity, but in the
resources required by the resulting programs.  While it is easier to
write programs that always allocate new storage for results, it is very
expensive compared to what average programmers write in other languages.
(Asside: I realize that this dredges up the functional programming
versus side-effect programming debate.  Common Lisp already supports
"destructive" versions of many functions.  Lisp compiler technology has
not progressed to the point of eliminating unnecessary storage
allocation in programs written in a purely functional style.)  Sometimes
the correct trade-off is to invest extra programming effort in
development of a program in exchange for lowered resource requirements
in the delivered system.  This is particularly true when the deployed
Lisp application may have thousands of instances.  It may well be worth
several extra man-years of effort to use facilities such as those
mentioned above rather than pay the increased cost of deployment
thousands of times!  (These costs are in the form of more expensive
hardware for deployment.  This might be due to higher memory
requirements or larger disk space requirements to support a large
virtual memory image and swap space.) If Common Lisp does not provide
this capability; programmers under serious performance constraints will
choose a different language, such as C.

Many programmers encounter problems writing efficient programs in Lisp.
I've encountered numerous occurances of this problem in helping people
who are trying to write Lisp programs as part of an application that is
intended to be delivered on hardware for which someone must pay real
money.  The superficial problem encountered is that the program is too
slow, or requires hardware too expensive for practical deployment.  The
most common result is that the programmer gives up, proclaiming that
Lisp is too slow or too resource intensive, and rewrites his program in
C.  Although the above enhancements alone are not sufficient to solve
the cited problem, something along these lines is necessary as part of a
more complete solution that might include better training of Lisp
programmers, standard support for stack consing, and ephemeral GC.  The
real benefit of adopting this extension to Common Lisp (and related
extensions) is an increased chance of survival for Lisp as a viable tool
for creating deliverable applications.

A related sort of optimization that compilers should be encouraged to
support is to recognize:

   (concatenate 'string (subseq ...) (subseq ...) (subseq ...) ...)

as being optimizable so as to avoid the allocation of storage for
intermediate SUBSEQ results.  This is a very common programming idiom
that can, of course, be expressed using the proposed extensions to avoid
intermediate consing.  However, the extensions do not provide as concise
and readable a mechanism for re-espressing this idiom as may others.
Other extensions could be considered to address this:

	(1) Provide a CONCATENATE function that takes a &rest
	argument of sequence/start/end triples.

	(2) Provide a special form like:
	       (designate-subseq sequence start end)
	that didn't actually allocate anything but provided an imple-
	mentation-specific "handle" on the subsequence for use
	in such expressions as the CONCATENATE expression above.
	(They'd print as the subseq, and setf's of their elements
	would destructively modify the original sequence.)	

However, there is no precedent in CLtL for (1).  And (2) seems like a
complex, sweeping change (for implementors) of limited benefit.  Also,
it is somewhat in conflict (though more general than) displaced arrays.
Neither are proposed.

I'd likely tolerate any sort of statements in the new CL manual warning
naive users of the dangers of this programming style.  I agree that it
can be mis-used and lead to nasty bugs; but sometimes the benefit is
worth the risk.  I simply want Common Lisp to provide the choice.

A related issue: would adoption of these arguments eliminate the need
for parallel destructive versions for some of the affected functions?
(E.g., does REMOVE with a :TARGET argument eliminate the need for
DELETE.)

An alternative to :TARGET for the keyword name is :DESTROY.

=======================