[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

proposal: eliminate forced consing



Issue:         ELIMINATE FORCED CONSING

References:    CLtL section 14.1,3,5; 15.2,5; 17.4; 18.3; 21.2

Category:      ADDITION

Edit history:  Version 1, 20-Jul-88, Ginder

Problem description:

The sequence functions included in CLtL encourage a programming style
that causes excessive storage allocation compared to libraries of
routines with similar functionality in other languages, notably C.
The only options available to the Common Lisp programmer who uses
these functions are to generate a newly-allocated sequence or to
destructively modify the argument sequence(s) given the function.

Proposal:

Add a :TARGET keyword argument to those sequence functions where such an
argument is useful, as specified below, which allows passing a target
argument sequence into which the result of the sequence computation is
to be placed.  The sequence function returns the target sequence
modified as specified below:

	(1) The target sequence must accomodate elements of the type(s)
	in the source sequence.  Thus it would be an error to give a
	target string argument if the source sequence had elements that
	were not of type STRING-CHAR.

	(2a) A non-list target sequence should have an allocated length
	long enough to accomodate the normal result of the sequence
	function.  It is permissable for the target sequence to have a
	fill pointer; in this case, the fill pointer is set to point
	immediately after the last element filled in by the sequence
	computation.  If the target sequence is longer than necessary,
	the unneeded trailing elements are unchanged.

	(2b) A list target sequence argument is extended with new conses
	to be as long as necessary to accomodate the resulting sequence,
	if not enough conses are supplied and the :TARGET-FROM-END keyword
	is nil.  The tail of the target list sequence not used to
	accomodate the sequence computation is returned as a second
	value by the sequence function, but remains linked as the tail
	of the target sequence.  [It might be desirable to return a
	third value, the last cons of the part of the list used for
	the result sequence, permitting the programmer to null-terminate
	that part of the target sequence, breaking the link with the
	unused tail, if appropriate.]

	(3) A :TARGET-FROM-END keyword is supported.  If non-nil, the
	target sequence is filled with new elements starting at the
	end of the target seqeunce, effectively reversing the order of
	elements of the resulting sequence in the target.  In this case,
	an error is signalled if the target sequence is not long enough
	for the result.  If the target sequence is longer than necessary,
	leading elements are unchanged.

	(4) :TARGET-START and :TARGET-END keywords are supported.
	TARGET-START and TARGET-END determine where in the target
	sequence the result of the sequence computation is placed.
	An error is signalled if the sub-length of the target
	sequence specified by these arguments is not long enough to
	accomodate the resulting sequence computation.  If a longer
	than necessary sub-length is specified, then the elements in
	the unneeded part of the specified sub-length are unchanged.

Affected sequence functions:
	subseq, copy-seq, reverse, remove, remove-if, remove-if-not,
	remove-duplicates, substitute, substitute-if, substitute-if-not,
	merge

Affected list functions:
	copy-list, butlast

	copy-alist, copy-tree, adjoin, union, intersection, set-difference,
	set-exclusive-or
 	  [no TARGET-START/END or TARGET-FROM-END, just destructive
	   use of the :TARGET conses]

Affected string functions:
	string-trim, string-left-trim, string-right-trim, string-upcase,
	string-downcase, string-capitalize 

Examples:

(copy-seq '(1 2 3) :target '(a b c d e f g))
	=> (1 2 3 d e f g)

(copy-seq '(1 2 3) :target '(a b c d e f g) :target-from-end t :end 5)
	=> (a b 3 2 1 f g)

(remove 'a '(b b a b b a b b a b b) :count 1 :target '(3))
	=> (b b b b a b b a b b)   ; EQ to :TARGET arg, CDR is new conses.

;;; Note, the :TARGET arg has a fill pointer.
(substitute #\a #\b "This is a string with 2 a's"
	    :target "0123456789012345678901234567890123456")
	=> "This is b string with 2 b's"

In a related addition, provide extended versions of concatenate, append,
revappend, and make-string-output-stream as follows:

	concatenate-into target &rest sequences
	  Like CONCATENATE, but the result type is determined to be
	  the type of TARGET.  The result is TARGET containing as many
	  of the elements of SEQUENCES as can be accomodated by the
	  allocated length of TARGET.  TARGET's fill pointer, if
	  present is set according to how many elements of TARGET ar
	  filled by this operation.

	concatenate-into-subseq target start end &rest sequences
	  Like concatenate-into, but copied from SEQUENCES into	the
	  sub-sequence of TARGET specified by START and END.

	append-into target &rest lists
	  Like APPEND, but the copied list arguments are copied into
	  conses taken from TARGET.  The last list in LISTS
	  is not copied, as in APPEND, rather, the last cons used
	  from TARGET is given the last list in LISTS as its cdr.
	  The result is EQ to TARGET (unless a single list is appended),
	  but contains only those conses needed to hold the appended
	  lists' elements.  The tail of unused conses from TARGET is
	  returned as a second value; new conses are allocated if
	  TARGET supplies an insufficient number of conses.

	revappend-into target x y
	  Like REVAPPEND, but the new conses are taken from TARGET.
	  The result is EQ to TARGET, but contains only those conses
	  needed to hold X's elements.  The tail of unused conses
	  from TARGET is returned as a second value; new conses are
	  allocated if TARGET supplies an insufficient number of conses.

	make-string-output-stream &optional string
	  Like the current MAKE-STRING-OUTPUT-STREAM, except if STRING
	  is provided, make the string-stream write to it rather than
	  a newly-allocated string.  An error is signalled if the output
	  overflows the supplied string.

A related optimization that compilers should be encouraged to support
is to recognize:

   (concatenate 'string (subseq ...) (subseq ...) (subseq ...) ...)

as being optimizable so as to avoid the allocation of storage for
intermediate SUBSEQ results.  This is a very common programming idiom
that can, of course, be expressed using the proposed extensions to avoid
intermediate consing.  However, the extensions do not provide as concise
and readable a mechanism for re-espressing this idiom as may others.
Other extensions might be considered to address this:

	(1) Provide a CONCATENATE function that takes a &rest
	argument of sequence/start/end triples.

	(2) Provide a special form like:
	       (designate-subseq sequence start end)
	that didn't actually allocate anything but provided an imple-
	mentation-specific "handle" on the subsequence for use
	in such expressions as the CONCATENATE expression above.
	(They'd print as the subseq, and setf's of their elements
	would destructively modify the original sequence.)	

However, there is no precedent in CLtL for (1).  And (2) seems like a
complex, sweeping change (for implementors) of limited benefit somewhat
in conflict (though more general than) displaced arrays.  So I don't
want to propose either.

Rationale:

It is sometimes better to use a more complex programming construct for
sake of efficiency than to use a less complex, more expensive one.
Excessive storage allocation is one of Lisp's most widely publicized and
least understood faults.

Current practice:

Similar functionality is provided for bit-vectors, as specified in CLtL
17.4.  A related capability is provided in the destructive versions of
some of the functions extended above.

When functionality similar to this is required, users must "roll their
own".  My experience in several commercial Lisp applications is that
avoiding storage allocation by using programming techniques such as this
may be critical to the success of a project.  The current sequence,
list, and string functions encourage an "expensive", albeit easier to
use, programming style that invites the creation of programs whose
performance suffers greatly in comparison to a program written to solve
the same problem written in C.

Cost to Implementors:

The cost of implementation of these extra sequence function arguments is
significant, but would not be burdensome compared to the utility
provided.  (In my humble opinion, of course.)

Cost to Users:

No cost, this is upward compatible.

Cost of non-adoption:

I submit that the cost of NOT implementing this sort of functionality is
that Common Lisp deserves the reputation it is rapidly gaining of being
very expensive to use, not in programmer productivity, but in the
resources required by the resulting programs.  While it is certainly
easier to write programs that always allocate new storage for results,
it is very expensive compared to what average programmers write in other
languages.  Sometimes the correct trade-off is to invest the extra
programming effort in development of a product in exchange for the
lowered resource requirements in the delivered system.  This is
particularly true when the deployed Lisp application may have thousands
of instances.  It may well be worth several extra man-years of effort to
use facilities such as those mentioned above rather than pay the
increased cost of deployment thousands of times!  (These costs are in
the form of more expensive hardware for deployment.  This might be due
to higher memory requirements or larger disk space requirements to
support a large virtual memory image.) If Common Lisp does not provide
this capability; programmers under serious performance constraints will
choose a different tool.

Benefits:

Those programmers choosing to invest the extra effort needed to
correctly use such facilities can do so with less overhead than
previously required, and will see performance improvement.  Programmers
learning Common Lisp will be challenged to understand the difference
between allocating new storage and re-using old storage as they
understand why these arguments are supported.  This is one of the chief
problems that most programmers encounter as they write Lisp programs.
I've encountered MANY occurances of this problem in helping people who
are trying to write Lisp programs as part of a real-world application
that is intended to be delivered on hardware for which someone must pay
real $$.  The problem encountered is that the program is too slow, or
requires hardware too expensive for practical deployment.  The most
common result is that the programmer gives up, proclaiming that Lisp is
too slow or too resource intensive, and rewrites his program in C.
Although these enhancements alone are not sufficient to solve the cited
problem, something along these lines is necessary as part of a more
complete solution that might include better training of Lisp
programmers, stack consing, and ephemeral GC.  The real benefit of
adopting this extension to sequence functions and similar extensions is
an increased chance of survival for Lisp as a viable tool for creating
deliverable applications.

Esthetics:

This proposal is likely to cause flames from the "functional
programming" camp (unless they've all left for Scheme).  It is
consistent (syntactically) with the existing keyword arguments to
sequence functions.  It adds to the plethora of keyword arguments already
present for many of these functions.

Discussion:

I'd likely tolerate any sort of statements in the new CL manual warning
naive users of the dangers of this programming style.  I agree that it
can be mis-used; but sometimes the benefit is worth the risk.  I simply
want Common Lisp to provide the choice.

Would adoption of these arguments eliminate the need for parallel
destructive versions for some of the affected functions?  (E.g., does
REMOVE with a :TARGET argument eliminate the need for DELETE.)

--Joe Ginder, Inference