[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Revised ELIMINATE-CONSING-PROPOSAL, version 3



Issue:         ELIMINATE FORCED CONSING

References:    CLtL section 14.1,3,5; 15.2,5; 17.4; 18.3; 21.2

Category:      ADDITION

Edit history:  Version 1, 20-Jul-88, Ginder
               Version 2, 22-Aug-88, Ginder
                   Moved discussion to the discussion section.
                   Changed multiple value proposal, clarified
                   certain points.
               Version 3, 31-Aug-88, Ginder
                   Revised according to various comments. Much of
                   my original discussion has been eliminated.

Problem description:

Some sequence, list, and string functions in CLtL encourage a
programming style that uses excessive storage allocation compared to
libraries of routines with similar functionality in other languages,
notably C.  The only options available to the Common Lisp programmer who
uses these functions are to generate a newly-allocated sequence or to
destructively modify the argument sequence(s) given the function.  The
option of providing a sequence, list, or string into which the result of
a sequence operation should be placed is not available.

Proposal:

Add mutually exclusive :RECYCLE or :MODIFY keyword arguments to those
sequence, list, and string functions where such arguments are useful.
:RECYCLE is used to pass a sequence to be recycled to hold the result of
the sequence, list, or string operation.  :MODIFY is used to pass a
sequence that is to be modified in some manner to contain the result of
the operation.  The distinction lies in that :MODIFY is akin to
destructively modifying a useful data structure, while :RECYLE is used
to simply recycle a data structure that would otherwise be useless and
possibly garbage (i.e., a data structure whose contents are no longer of
interest and may be altered arbitrarily).  The sequence, list, or string
function returns the MODIFY'd or RECYLE'd sequence as specified below.
It is an error to pass both a :MODIFY and a :RECYCLE argument.

[a] Support a MODIFY argument:

(1) The sequence to be modified must accomodate elements of the type(s)
that would have been stored in a newly-allocated sequence.  Thus it
would be an error to give a target string argument if the result of the
operation would normally have had elements that were not all of type
STRING-CHAR.

(2a) A non-list MODIFY argument should have an allocated length long
enough to accomodate the normal result of the sequence function.  The
sequence to be modified may have a fill pointer.  The fill pointer
remains unchanged by this operation so long as it points after the
modified elements of the sequence.  If the fill pointer points prior to
or in the range of the modified elements, it is set to point immediately
after the last element filled in by the sequence computation.  If the
sequence to be modifed is longer than necessary, the unneeded trailing
elements are unchanged.

(2b) A list MODIFY argument is extended with new conses to be as long as
necessary to accomodate the resulting sequence, if not enough conses are
supplied and the :MODIFY-FROM-END keyword is nil.  Two values are
returned:  the first is the modified list with any unused tail remaining
intact. (I.e., The unused conses remain as a tail of the modified list
that is returned.)  The second is that tail of the original list that
was not modified by the operation.  This second value is useful for
continued modification of the tail.

(3) :MODIFY-START and :MODIFY-END keywords are supported.  MODIFY-START
and MODIFY-END determine where in sequence the result of the sequence
computation is placed.  It is an error if the sub-length of the sequence
specified by these arguments is not long enough to accomodate the
sequence computation.  If a longer than necessary subsequence is
specified, then the elements in the unneeded part of the specified
subsequence are unchanged.

(4) A :MODIFY-FROM-END keyword is supported.  If non-nil, the sequence
to be modified is filled with new elements starting at MODIFY-END (e.g.,
the end of the sequence if MODIFY-END is not specified), effectively
reversing the order of elements of the as the sequence is modified.  It
is an error if the supplied sequence is not long enough for the result.
If the supplied sequence is longer than necessary, leading elements are
unchanged.

Sequence functions changed to include MODIFY, MODIFY-START, MODIFY-END,
MODIFY-FROM-END:
	subseq, copy-seq, reverse, remove, remove-if,
	remove-if-not, remove-duplicates, substitute, substitute-if,
	substitute-if-not, merge

List functions changed to include MODIFY, MODIFY-START, MODIFY-END,
MODIFY-FROM-END:
	copy-list, butlast

String functions changed to include MODIFY, MODIFY-START, MODIFY-END,
MODIFY-FROM-END:
	string-trim, string-left-trim, string-right-trim, string-upcase,
	string-downcase, string-capitalize

[a] Support a RECYLE argument:

(1) The sequence to be recycled must accomodate elements of the type(s)
that would have been stored in a newly-allocated sequence.  Thus it
would be an error to give a RECYLE string argument if the result of the
operation would normally have had elements that were not all of type
STRING-CHAR.

(2a) A non-list RECYCLE argument should have an allocated length long
enough to accomodate the normal result of the sequence function.  The
sequence to be recycled may have a fill pointer.  The fill pointer is
set to point immediately after the last element filled in by the
sequence computation.  If the sequence to be recycled is longer than
necessary, the unneeded trailing elements are unchanged.

(2b) A list RECYCLE argument is extended with new conses to be as long
as necessary to accomodate the resulting list, if not enough conses are
supplied.  Two values are returned:  the first is the result list
terminated with NIL.  (I.e., The unused conses are spliced out of the
recycled list that is returned.)  The second is that tail of the
original list that was not used by the recycling operation.

(3) No :RECYCLE-START, :RECYCLE-END, or :RECYCLE-FROM-END arguments are
supported.

Changed sequence functions:
	subseq, copy-seq, reverse, remove, remove-if,
        remove-if-not, remove-duplicates, substitute, substitute-if,
        substitute-if-not, merge

Changed list functions:
	make-list, copy-list, copy-alist, copy-tree, butlast, subst,
        sublis, adjoin, union, intersection, set-difference,
        set-exclusive-or

Changed string functions:
	string-trim, string-left-trim, string-right-trim, string-upcase,
	string-downcase, string-capitalize 

Examples:

(copy-seq '(1 2 3) :modify '(a b c d e f g))
	=> (1 2 3 d e f g)

(copy-seq '(1 2 3) :modify '(a b c d e f g) :modify-from-end t :modify-end 5)
	=> (a b 3 2 1 f g)

(remove 'a '(b b a b b a b b a b b) :count 1 :recycle '(3))
	=> (b b b b a b b a b b)   ; EQ to :recycle arg, CDR is new conses.

(substitute #\a #\b "This is a string with 2 a's"
	    :modify (make-array 37 :element-type 'string-char :fill-pointer 0))
	=> "This is b string with 2 b's" ; result EQ to MODIFY arg, 
					 ; result has a fill-pointer set to 27 

(substitute #\a #\b "ababa"
	    :modify (make-array 10 :element-type 'string-char
				   :initial-element #'q
	                           :fill-pointer 10))
	=> "aaaaaqqqqq" ; result EQ to MODIFY arg, 
			; result has a fill-pointer set to 10 

(copy-list '(a b c) :recycle '(1 2 3 4 5 6 7))
	=> (a b c)    ; first value
           (4 5 6 7)  ; second value

(copy-list '(a b c) :modify '(1 2 3 4 5 6 7))
	=> (a b c 4 5 6 7)  ; first value
           (4 5 6 7)        ; second value

In related additions, included here since they address the same problem
as above, provide extended versions of concatenate, append, revappend,
make-string-output-stream, and read-line as follows:

  concatenate-into target &rest sequences
    Like CONCATENATE, but the result type is determined to be
    the type of TARGET.  The result is TARGET containing as many
    of the elements of SEQUENCES as can be accomodated by the
    allocated length of TARGET.  TARGET's fill pointer, if
    present is set according to how many elements of TARGET are
    filled by this operation.
  
  map-into target function sequence &rest sequences
    Like MAP, but the result type is determined to be the type
    of TARGET.  The result of MAP-INTO is TARGET such that
    element j is the result of applying FUNCTION to element j
    of each of the argument sequences.  TARGET must be as long
    as the shortest of the input sequences.  TARGET's fill
    pointer, if present is set according to how many elements of
    TARGET are filled by this operation.
  
  append-into target &rest lists
    Like APPEND, but the copied list arguments are copied into
    conses taken from TARGET.  The last list in LISTS
    is not copied, as in APPEND, rather, the last cons used
    from TARGET is given the last list in LISTS as its cdr.
    The result is EQ to TARGET (unless a single list is appended),
    but contains only those conses needed to hold the appended
    lists' elements.  The tail of unused conses from TARGET is
    returned as a second value; new conses are allocated if
    TARGET supplies an insufficient number of conses.
  
  revappend-into target x y
    Like REVAPPEND, but the new conses are taken from TARGET.
    The result is EQ to TARGET, but contains only those conses
    needed to hold X's elements.  The tail of unused conses
    from TARGET is returned as a second value; new conses are
    allocated if TARGET supplies an insufficient number of conses.
  
  make-string-output-stream &optional string
    Like the current MAKE-STRING-OUTPUT-STREAM, except if STRING
    is provided, it must be a string with a fill pointer.  Output
    to the resulting stream is incrementally appended to STRING,
    as if using VECTOR-PUSH-EXTEND if the string is adjustable,
    and otherwise using VECTOR-PUSH.
  
  read-line-into-string string &optional input-stream eof-error-p
			eof-value recursive-p
    Like READ-LINE, but reads into STRING.  STRING must be a string
    with a fill pointer.  The result of READ-LINE-INTO-STRING is
    incrementally appended to STRING, as if using VECTOR-PUSH-EXTEND
    if the string is adjustable,and otherwise using VECTOR-PUSH.

In order to facilitate recycling alists and trees, the following two
functions are proposed.

  flatten-tree tree
    FLATTEN-TREE would take a tree and return a linear list made
    of all the conses in the tree, suitable for recycling via a
    :RECYLE argument.

  flatten-alist alist
    FLATTEN-ALIST would take an alist and return a linear list
    made of all the conses that would be copied by COPY-ALIST,
    suitable for recycling via a :RECYLE argument.

Rationale:

It is sometimes better to use a more complex programming construct for
sake of efficiency than to use a less complex, more expensive one.
Providing a functions that do not require dynamic storage allocation
provides a means for writing programs that must avoid storage allocation
while running as much as possible.  Excessive storage allocation
(sometimes expressed as "Lisp garbage collects too often") is one of
Lisp's most widely publicized faults.

Current practice:

Similar functionality is provided for bit-vectors, as specified in CLtL
17.4.  A related capability is provided in the destructive versions of
some of the functions extended above (e.g., REMOVE vs. DELETE).

When functionality similar to this is required, users must "roll their
own".

[from Moon] Symbolics Common Lisp already has MAP-INTO.

Cost to Implementors:

The cost of implementation of these extra sequence, list, and string
function arguments is significant, in that existing code to implement
these functions must be changed to use storage passed in as an argument
by the user.

Cost to Users:

No cost for existing programs, this change is upwards compatible.  Users
may mis-use the new keyword arguments and end up with buggy code,
similar to the problems encountered when using C library functions that
re-use data structures.

Cost of non-adoption:

Programmers will continue to "roll their own" non-standard storage
re-using code.  Others will not go to this effort, but will write
programs that require much garbage collection.  Lisp will continue to
suffer from a handicap in promoting efficient programs when compared to
C.

Benefits:

Those programmers choosing to invest the extra effort needed to
correctly use such facilities can do so with less overhead than
previously required, and will see performance improvement.

Esthetics:

Neutral.  This proposal is syntactically consistent with the existing
keyword arguments to sequence, list, and string functions.  However, it
adds to the plethora of keyword arguments already present for many of
these functions.  Stylistically, it provides stronger support for the
programming style embodied by the use of the "destructive" versions of
many Common Lisp functions.

Discussion:

My experience in several commercial Lisp applications is that avoiding
storage allocation by using programming techniques such as this may be
critical to the success of a project.  The current sequence, list, and
string functions encourage an "expensive", albeit easy to use,
programming style that invites the creation of programs whose
performance suffers greatly in comparison to a C program written to
solve the same problem.  This applies particularly to string-hcaking
programs written using Common Lisp versus those written using the
standard string library for C.

A related issue: would adoption of these arguments eliminate the need
for parallel destructive versions for some of the affected functions?
(E.g., does REMOVE with a :MODIFY argument eliminate the need for
DELETE.)

Pierson supports the general ideas of an earlier version of this
proposal, but believes the proposal itself should be in the form that
would appear in a revised manual.

Several people responded in the negative to Fahlmans suggestion that
perhaps the new LOOP facility would relieve the need for these
extensions.  Fahlman is ready to support the proposal in principle if
certain details are fixed up.

Moon does not like returning unused conses as a second value as proposed
in previous versions of this proposal.  He also does not believe that
UNION, INTERSECTION, SET-DIFFERENCE, and SET-EXCLUSIVE-OR derive much
benfit from previously suggested extensions similar to those in this
proposal.  Neither should the functions SUBSEQ, COPY-SEQ, COPY-LIST, and
BUTLAST be modified, because the functionality is already available from
REPLACE, he claims.  He thought :TARGET should be changed to :OVERWRITE.

All of the above comments were made prior to version three of this
proposal.

MAPCAR and friends might be considered candidates for modification in
this proposal.  However, these mapping functions likely will be made
obsolete by LOOP if they are not already obsolete.