[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Tokenization and READ-MACROs

To: Charles Dolan <cpd@UCLA-CS.ARPA>
Subject: Re: Tokenization and READ-MACROs
From: Jonathan Rees <Rees@YALE.ARPA>
Date: Sun ,6 Nov 83 23:02:28 EDT
Cc: dyer@UCLA-CS.ARPA, T-Users@YALE.ARPA
In-reply-to: Charles Dolan <cpd>, Mon, 31 Oct 83 20:24:55 PST
    Date:           Mon, 31 Oct 83 20:24:55 PST
    From:           Charles Dolan <cpd>
    Subject:        Tokenization and READ-MACROs
    
    ... The READ-MACROS in the (MAKE-STANDARD-READ-TABLE) are objects.
    This means that if you replace some of the special characters,
    
            #\(,#\),#\[,#\],etc
    
    your replacement procedure may not respond to the same operations
    as the ones that come standard.
    
    For example, the tokenizer asks the "object" in the READ-TABLE if it
    is a separator when it is trying to find the end of an atom. Your 
    procedure probably won't answer T since you don't even know what
    that operation is. Neither do I.
    
    There is hope. You should make your procedure an object (See page 27 of
    the T manual for the equivalence of OBJECT and LAMBDA notation.) The
    thing you put in the READ-TABLE should be "(JOIN new-object old-object)"
    (order is important).

    This object will respond properly to the operation inquiring about
    delimiter-ness.

    NOTE: this will not work in TAU 2.7 because they are removing
    JOIN which is sort of buggy. However JOIN works on this case.
    
T 2.7 will have a released way to make a read macro be a delimiting
read macro, like (, ), and ;.  This will be described in the release
notes and in the next edition of the manual.  In the meantime: your
analysis is basically correct; the operation of interest, for the
curious, is called CONSTITUENT-SYNTAX?.  As described in the current
manual, characters are classified as either "delimiter" or "constituent"
characters (with respect to a given read table).  Ordinarily, any
read macro character is a constituent character, which means that
it will not break an "atom".  However, if the procedure for the read
macro handles the CONSTITUENT-SYNTAX?  operation (which returns true
by default) by returning false, then the character will be a delimiter.
(The released version of this predicate will likely have the reversed
sense.)

In any case, I answer this message not to explain read macros but
more because I want to try to dispel a common confusion about the
meaning of the term "object".  I guess the manual does a pretty poor
job of explaining it (what definition there is is on page 7), but
I think the term is at least used consistently.  Every datum, be it
a number, list, procedure, or anything else, is an object.  The common
confusion is that the value of an (OBJECT ...) form is a thing of
some special type "object," and that these "objects" are different\n from the other things manipulated by programs.

This is not the case.  For example, when the manual says (page 27) that

    (OBJECT (LAMBDA ...))
    
is the same as

    (LAMBDA ...),
    
it is not kidding.  That is, wherever you can write one, you can write the
other.  (You can even write (OBJECT (OBJECT (LAMBDA ...))).)  Both
expressions yield objects, objects which also happen to be procedures.

If you think that the things you create by evaluating OBJECT-expressions
are different from the things that the system creates when you call
routines like CONS or +, think again.  In principle, everything could
made from OBJECT-expressions, and there would be no way to distinguish
a T implementation in which this was the case from one that wasn't.
If CONS actually was defined to be

    (DEFINE (CONS X Y)
      (OBJECT NIL
              ((CAR SELF) X)
              ((CDR SELF) Y)
              (((SETTER CAR) SELF VAL) (SET X VAL))
              (((SETTER CDR) SELF VAL) (SET Y VAL))
              ((PAIR? SELF) T)))

then you wouldn't be able to tell the difference.  (You might notice that
CAR and CDR were operations - but on the other hand, the implementation
might protect you from that knowledge by doing

    (DEFINE (USER-VISIBLE-CAR OBJ) (CAR OBJ))
    (DEFINE (USER-VISIBLE-CDR OBJ) (CDR OBJ))
    
and giving you pointers to THESE routines under the names CAR and CDR.)
Similarly, for all a user can tell, + might be implemented by something
along the lines of

    (DEFINE (+ NUM1 NUM2)
      (COND ((ZERO? NUM2) NUM1)
            (ELSE (SUCCESSOR (+ NUM1 (PREDECESSOR NUM2))))))

    (DEFINE (SUCCESSOR NUM)
      (OBJECT NIL
              ((PREDECESSOR SELF) NUM)
              ((ZERO? SELF) NIL)))

    (DEFINE-OPERATION (ZERO? NUM))

    (DEFINE 0 (OBJECT NIL ((ZERO? SELF) T)))

    (DEFINE 1 (SUCCESSOR 0))  ;etc.

and, semantically speaking, there would be no way to tell.  In fact,
all the "types" which are thought of as "primitive" in T, are defined
in terms of OBJECT either in reality or in essence.  Things which
really are primitive at the lowest levels, like pairs and small
integers, actually could be implemented with OBJECT forms, but for
have been done differently only for the sake of efficiency.
  
So it is vacuous to say that "X is an object", because everything is.
There is no way to "make a procedure be an object," because all
procedures already are objects.  What is not vacuous is to say that
"X is an object which handles ...  by doing ...."  E.g.:  An input stream
is an object which handles READC by the next character in some sequence.
The procedure returned by (OBJECT (LAMBDA ...) ((FOO SELF) ...))
handles the FOO operation.  And so on.

So, to make a short story long, the way to define a delimiting read
macro is with

    (SET (READ-TABLE-ENTRY ...)
         (OBJECT (LAMBDA ...)
                 ((CONSTITUENT-SYNTAX? SELF) NIL)))

One could define an ordinary read macro with

    (SET (READ-TABLE-ENTRY ...)
         (OBJECT (LAMBDA ...)
                 ((CONSTITUENT-SYNTAX? SELF) T)))

but here the CONSTITUENT-SYNTAX? method is unnecessary, since that
operation returns true by default; so

    (SET (READ-TABLE-ENTRY ...)
         (OBJECT (LAMBDA ...)))

is equivalent; but then it is redundant to say (OBJECT ...) when there
are no method clauses; so that is why one usually writes simply

    (SET (READ-TABLE-ENTRY ...)
         (LAMBDA ...))

I don't mean to come down hard on your usage; I'm just trying to clarify
something which the manual doesn't make clear, and about which users
are justifiably confused.

People should feel free to send comments about the manual or language
to me and/or to Norman Adams.  We get relatively little direct feedback\n from users considering how many problems there are with the system
and its documentation.  For example, I can't remember having heard
any complaints about there being no released way to determine
delimiterness of readmacros, even though I know that this affects
a number of people.  We may often reply with unsatisfactory answers,
but don't let that stop you.  Feedback is essential, and better too
much than too little.

Perhaps there should be yet another mailing list for general questions
and requests, since T-Bugs is really only appropriate for bug reports,
T-Users is intended for announcements only, and T-Discussion is intended
only for grander issues (?).  Maybe T-Implementors, or T-Documentors,
or something like that.

                            - Jonathan
Prev by Date: Name needed
Next by Date: BINDing unbounds
Previous by thread: Name needed
Next by thread: BINDing unbounds
Index(es):
- Date
- Thread