[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: STRING subsequencing functions, etc



    Date: 12 February 1981 2335-EST (Thursday)
    From: Guy.Steele at CMU-10A
    I observe that as a general rule string functions specifying substrings
    use a start position and end position on the LISP Machine, but in NIL
    theytake a start position and a count. . . .   LM has
    SUBSTRING and NIL has STRING-SUBSEQ, etc.
This "general rule" is a false cognate -- NIL compatibly has all the LISPM 
string-specific functions except NSUBSTRING (yes, I'm aware that this
wasn't mentioned in MC:NIL;NEWFUN > -- that document is a bit out-of-date,
and for that past year we've been coding so much that there seems to be
little interest in keeping it fully up-to-date.)  In addition, there are all
the newly-invented generic sequence functions, whose names don't "cross 
paths" with the LISPM names;  it is the latter series which has the PL/I
convention for substring denotation.  Thus, STRING-SUBSEQ is just SUBSEQ 
restricted to STRINGs, and differs from SUBSTRING only in the way
of specifying the substring length.

    I think LM has STRING-EQUAL and NIL has STRING-EQUALP, .  .  .

I don't think NIL has STRING-EQUALP really;  it does have STRING-EQUAL
just as on the LISPM.  For a long time, there has been a lobbying effort 
here (and I believe from you too, GLS) to insure that EQUAL doesn't use
this "friendly function" when applied to strings -- EQUAL shouldn't 
ignore case.  The NIL function which is almost equivalent to the
case-sensitive comparison is STRING-MISMATCHQ.
    If that's not enough, the function STRING does different things
    on the two machines (though I think they could be merged compatibly).
There isn't a definition for STRING in NIL yet -- but there is
TO-STRING, which does essentially the same thing;  in fact there are
a bunch of other functions TO-<mumble>, for various values of <mumble>,
and these were discussed about a year ago under the notion of 
"Coercion functions".  One bad feature of the LISPM definition for
STRING is that it is not symmetric with the function LIST, VECTOR etc
which take n arguments and compose them into a list, vector, etc
respectively;  by this analogy, one would think that 
    (STRING ~A #/b 67. 'D)
should yield "AbCD"


NIL's decision to use a COUNT indicator for sub-sequences, rather than
and BOUNDARY was influenced partly by other languages and partly by
the format of some existing machine instructions which take a COUNT -- 
e.g., all the string instructions on the VAX take an operand which is 
a "length count", and the move-characters instruction of the 370 also
does this.  Thus, STRING-SEARCHQ could be open-coded for the VAX
using the MATCHC instruction, but STRING-SEARCH (as defined in the LISPM)
would require the BOUNDARY/COUNT patch-up at all usages.  As it 
happens, since the LISPM definition for STRING-SEARCH used to use
a case-ignoring comparison, it couldn't have been open-coded anyway.