[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Issue: SEQUENCE-FUNCTIONS-EXCLUDE-ARRAYS (Version 5)



This is another version that I apparently neglected to mail out.

!
Issue:        SEQUENCE-FUNCTIONS-EXCLUDE-ARRAYS
References:   Sequences (pp245-261)
              Issue REMF-DESTRUCTURING-UNSPECIFIED
                (discussion of NREVERSE, NSUBSTITUTE)
              Issue AREF-1D
Category:     ENHANCEMENT
Edit history: 05-Feb-87, Version 1 by Touretzky
              28-Apr-87, Version 2 by Pitman (variations on the theme)
              26-Oct-87, Version 3 by Masinter (clean up for release)
              11-Nov-87, Version 4 by Masinter (respond to comments)
               5-Feb-88, Version 5 by Masinter

Description:

Common Lisp provides many useful operations on lists and vectors which don't
apply to arrays.

For example, one can FILL a vector with 0's, but not an array. One can REPLACE
the contents of one vector with another, but one can't do this for arrays.  One
can verify that EVERY element of a vector has some property, but one can't do
this for arrays, and so on.

The programmer who wishes to use arrays instead of vectors must give up most of
the useful tools CLtL provides for manipulating sequences, even though there is
no intuitive reason why operations like FILL, REPLACE, and EVERY shouldn't work
on arrays.

Common Lisp already provides a facility called "displaced arrays" which can be
used to overlay one array on top of a portion of another, even if the two are of
different ranks, so that the two share storage or use the awkward convention of
creating a displaced array to the operand. However, creating a displaced array
merely to perform FILL, REPLACE or EVERY is awkward.

Proposal (SEQUENCE-FUNCTIONS-EXCLUDE-ARRAYS:GENERALIZE):

1) Extend the definitions of sequence functions that either return their
argument sequences or return non-sequences so that they also allow arrays of
rank other than 1. These functions should treat array arguments as vectors
displaced to the array storage (in row-major format). The affected functions are
LENGTH, ELT, COUNT, FIND, POSITION, SOME, EVERY, NOTANY, NOTEVERY, REDUCE,
SEARCH, MISMATCH, FILL, REPLACE, SORT.

For example, suppose A is a 3x2x7 array. (LENGTH A) should return 42, and (ELT A
7) should return the same element as (AREF A 0 1 0).  :START and :END keywords
would be interpreted relative to the vector, as would the results returned by
POSITION and SEARCH.

2) Extend the definitions of sequence functions whose result should be the same
shape as but not necessarily EQ to some argument. These functions should deal
with array arguments by returning an array of the same shape as the argument,
and operate on their argument in row-major order. The affected functions are
SUBSTITUTE, NSUBSTITUTE, REVERSE, NREVERSE, SUBSTITUTE-IF, NSUBSTITUTE-IF,
SUBSTITUTE-IF-NOT, NSUBSTITUTE-IF-NOT and MAP.

3) Sequence functions that modify the number of elements in the array remain
unchanged: it is an error to pass arrays of rank other than 1. (The functions
are not extended because the shape would be undefined.) The unaffected functions
are SUBSEQ, COPY-SEQ, CONCATENATE, MERGE, REMOVE, REMOVE-DUPLICATES, DELETE,
DELETE-DUPLICATES.

Note that EQUALP does -not- treat arrays as vectors.  It is not a sequence
function, and it already has a well-defined behavior for arrays. To test whether
the arrays A and B, of different shapes, have the same elements, one might
write:

	(AND (= (LENGTH A) (LENGTH B)) (EVERY #'EQL A B)).

Rationale:
  
This is a useful upward compatible extension with relatively low adoption cost.
  
Cost to implementors:
  
This would involve a lot of changes to functions, but all of the changes are
likely to be minor. The presence of displaced arrays in the language already
guarantees that the internal storage format needed to back up these proposed
changes is already in place. 

Cost to Users:
  
This change is upward compatible; current user code should run unmodified.
  
Benefits:
  
This proposal adds natural support for multi-dimensional arrays. Currently,
users must write nested DO loops every time they want to perform an array
operation equivalent to FILL, REPLACE, REDUCE, etc., or else they buld
displaced vectors by hand and pass them to the sequence functions when
necessary. With this proposal, users of arrays do not have to use home-grown
utilities to duplicate functionality already primitively provided to users of
arrays. The sequence functions become useful in a variety of new situations.
  
Aesthetics:
  
This proposal extends sequence functions to cover arrays while neatly avoiding
the temptation to turn Common Lisp into a half-baked APL. Instead of trying to
provide a full set of array handling primitives (which would be needed to take
arbitrary k-dimensional slices out of n-dimensional arrays, or to apply an
operator across a specific dimension of a multidimensional array), it requires
just one rule:

    treat arrays as displaced vectors where it is well-defined.

Current Practice:

We know of no current implementation of this proposal.
 
Discussion: 

This issue was discussed by the cleanup committee at length; what follows is
only a brief summary of the discussion.

An alternative considered was to only affect those functions which didn't
explicitly depend on the shape of the array; that is, to modify  COUNT, SOME,
EVERY, NOTANY, NOTEVERY, FILL, REPLACE, SUBSTITUTE, NSUBSTITUTE, and MAP, and
expressly forbid arrays as arguments to other sequence functions, including
LENGTH, ELT, FIND, POSITION, REDUCE, SEARCH,  MISMATCH, REVERSE, NREVERSE, SORT,
MAP, as well as SUBSEQ, COPY-SEQ, CONCATENATE, MERGE, REMOVE, REMOVE-DUPLICATES,
DELETE, DELETE-DUPLICATES. This would be less controversial, since it includes
only those functions which do not deal with positional information. Some hedging
over REDUCE and FIND, which often have non-positional uses, were considered.

Touretzky supports SEQUENCE-FUNCTIONS-EXCLUDE-ARRAYS:GENERALIZE. He's been
building displaced vectors to pass to sequence functions when necessary and
really dislikes it.

We considered but discarded as unworkable an alternative where POSITION and FIND
might deal with "positions" as lists of array subscripts.

At one point, this proposal included an extension to COERCE to allow COERCE to
translate from array types to sequences, but it was thought better to separate
out COERCE. We considered a proposal to only introduce a change to COERCE, and
require users who want to operate on arrays explictly perform, e.g.,  (COUNT
item (COERCE x 'SEQUENCE)). This alternative would have the lowest adoption cost
but was deemed awkward.

Sequence functions like FILL and COUNT are already generalized to arrays in
non-Lisp contexts; in English we use the generalized forms all the time, e.g.,
"count the number of 1's in this matrix."

There is some concern that this proposal makes the requirement for row-major
ordering of internal storage for arrays even more evident. While such an
internal ordering is implied by the ability to create displaced arrays and the
AREF-1D proposal, this proposal adds yet another constraint.

The general reason for opposing this change is that it adds more mechanism than
it is worth. The general reason for liking it is that it adds generality at
little cost.