[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Oh no, it's the start/end versus start/count controversy

Well, we've gotta decide what to do about the start/end versus
start/count problem.  Maybe this is a matter of religion but somehow
we've got to cope with it.  I agree with whoever it was who recently
said here that we want to support both options; but this won't come
easily.  The problem is names.

To review, the problem is with things like SUBSEQ, SUBSTRING, and
SUBBITV.  (A BITV is a bit-vector; ignore that choice of type name for
now, that's also up in the air.)  When extracting a sub-sequence, do we
designate the range to be extracted by giving the start element index
and end index, or by a start index and a count of the number of elements
to be extracted?  (Note that zero-origin is definitely decided upon, and
also that in the start/end case, the end is the index of the first
element not selected, not that of the last element selected.)

The Common Lisp "committee," and the NIL "committee" before that,
discussed this question at length.  NIL decided on start/count, Common
Lisp (and Lisp Machine) on start/end.  Start/count is nicer for hardware
like the VAX which prefers things in that form, and is also nicer for
things like bit vectors where you often want to extract a field of known
width.  However, the Lisp Machine people are adamant that start/end is
much more convenient from the programmer's point of view, at least in
the case of strings.

Given that we decide to support BOTH start/count and start/end in a
clean way, we have the naming problem to cope with.  Yuck!  One really
wants to use "subsequence" or "subseq" to refer to a subsequence, but
what to call the two different extraction primitives?  Which case does
SUBSEQ refer to, and what should the other case be called?  The other
for example, is out).

At least in the SUBSEQ case, I suggest the alternate names type-SUBSEQ
=> SUBtype (e.g., STRING-SUBSEQ becomes SUBSTRING) for the type-specific
routines, as convenient and natural abbreviations.

We also have the problem of names for shared versus nonshared
subsequences to cope with... for example, a shared subvector cannot be
implemented as a vector, because vectors, unlike strings, do not have
headers; either you get a new, unshared vector, or you get some strange
object which acts like a sequence but isn't a vector.  So unshared
vectors are natural.  Whereas shared substrings are efficiently
implemented, so one would like to use them.

ALGOL68 uses the term "slice" to refer to a shared sub-array...  so I'll
suggest the following names, just to get people started flaming...  I
know the @'s are absurd (T philosophy: use as few special characters as
possible), but here goes.

(SLICE   seq start count)  - shared
(SLICE@  seq start end)
(SUBSEQ  seq start count)  - copied
(SUBSEQ@ seq start end)

Plus (type-SLICE seq ...), for important type-specific routines,
(SUBtype seq ...), etc.  (SUB@type? SUBtype@? yug.)

The string routines of recent messages posted to this distribution list

Personally, I prefer the start/count form; that's why I made the OTHER
form have the ugly @'s in them.

More naming problems: what to call (a) the routine corresponding to
SUBBITV (BITV-SUBSEQ) which returns the result not as another sequence,
but as an integer?  And (b) the corresponding routines which extract not\n from a bit vector, but an integer?  As far as I can tell, Common Lisp and
Lisp Machine Lisp have nothing corresponding to (a) (the Lispm has no
bit vectors per se, just ART-1B arrays... sigh...); corresponding to (b)
are LDB and DPB, which in Maclisp have the alternate forms LOAD-BYTE and
DEPOSIT-BYTE (the field is specified differently).  NIL, I think,
provides (a) and calls it NIBBLE, which seems a little bizarre to me.

Interesting that Common Lisp uses start/count for LDB but start/end

--------------------				--MORE--

The concept of sequence and generic sequence operations for LISP
originated (I think) with the NIL project; a good explanation and
rationale for this mess is given in the draft Common Lisp manual,
and a poor explanation is given in the new draft T manual.  I should
explain it here but don't have the energy at the moment.

Apparently I've decided that generic sequence operations are a good idea
for T.  I don't want to go whole-hog with this idea like Common Lisp
has, but I think it's a good overall guiding principle.  More on this
later if people are confused & have questions.

(About my views on Common Lisp's sequences:  I think I'm generally
sympathetic.  There's a lot of good stuff there.  Common Lisp has even
gone so far as to make things like LENGTH, APPEND, and REVERSE generic
across sequence types, something I find myself wanting to do for T with
increasing frequency; but my list-oriented constituency will scream
gratuitous generality, especially before I've added some type analysis
and declaration processing to the compiler.  But Common Lisp provides so
much, so many different kinds of iterators and such, that I feel certain
it's left out a lot, and that some unifying concepts have eluded all
who've thought about the problem.  The implementation is going to be
huge and complicated, and I try to avoid that kind of thing.  I was the
first person to try to implement NIL's generic sequence functions and
felt sure there was something wrong then, and I still feel that way,
even though I've been thinking about the problem for two and a half

Perhaps all these semantic & naming difficulties are evidence that the
idea of sequence is either bogus or intractible; maybe strings and bit vectors
really ARE different things, and we shouldn't be misled by their
superficial similarity.  I sincerely hope not...

Whew!  Sorry about all this... had to get it off my chest...