[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Issue: READ-CASE-SENSITIVITY (Version 3)



I am not going to be able to come to the meeting but would
still like to see this issue considered.  I wasn't present
for the discussion at the last meeting, but my understanding
is that the committee wanted to see the issue again with
only a single proposal.  Unfortunately, I don't know which
proposal (keyword or function) most people prefer.

It has also been remarked that :INVERT is somewhat strange in that it
would have Zebra read as zEBRA; and it was suggested that inversion
should happen only if the entire name were single-case.

Unfortunately, the processing has to happen a character at a time,
because READ has to do it only for characters that are not escaped.
For example, |zebra| should always read as zebra.

Apart from that, very little has been said about this proposal.
I hope that anyone who does have objections will be able to raise
them by mail before the meeting.

Issue:        READ-CASE-SENSITIVITY

Forum:	      Cleanup

References:   CLtL p 334 ff: What the Read Function Accepts,
                especially p 337, step 8, point 1.
              CLtL p 360 ff: The Readtable
              COPY-READTABLE (CLtL, p 361)
              *PRINT-CASE* (CLtL, p 372)

Category:     ADDITION/CHANGE

Edit history: Version 1, 15-Feb-89, by Dalton
              Version 2, 23-Mar-89, by Dalton,
                (completely new proposal after comments from
                 Pitman, Gray, Masinter, and R.Tobin@uk.ac.ed)
              Version 3, 16-Jun-89, by Dalton
                (very minor changes in presentation
                 and some additions to the discussion)

Problem Description:

  The Common Lisp reader always converts unescaped constituent
  characters to upper case.  (See CLtL, p 337, step 8, point 1.)
  This behavior is not always desirable.

  1.  Lisp applications often use the Lisp reader to read their data.
  This is often significantly easier than writing input routines
  from scratch, especially if the input can be structured as lists.
  However, certain applications want to make use of case distinctions,
  and Common Lisp makes this unreasonably difficult.  (You must define
  every letter as a read macro and have the macro function read the
  rest of the symbol, or else you must write a reader from scratch.)

  2.  Some programming languages distinguish between upper and lower
  case in identifiers, and useful conventions are often built around
  such distinctions.  For example, in C, constants are often written
  in upper case and variables in lower.  In Mesa(?) and Smalltalk(?),
  a capital letter is used to indicate the beginning of a new word
  in identifiers made up of several words.  In Edinburgh Prolog,
  variables begin with upper-case letters and constant symbols do
  not.  The case-insensitivity of the Common Lisp reader makes
  it difficult to use conventions of this sort.

Proposal (READ-CASE-SENSITIVITY:READTABLE-KEYWORDS)

  Define a new settable function, (READTABLE-CASE <readtable>) to
  control the reader's interpretation of case.  The following values
  may be given:

    :UPCASE   --  convert unescaped characters to upper-case, as now.
    :DOWNCASE --  convert unescaped characters to lower-case.
    :PRESERVE --  don't convert, leaving lower-case letters in lower
                  case and upper-case characters in upper case.
    :INVERT   --  convert lower-case to upper and upper-case to lower.

  COPY-READTABLE copies the setting of READTABLE-CASE.  The value of
  READTABLE-CASE for the standard readtable is :UPCASE.

  The READTABLE-CASE of a readtable also has significance when
  printing.  The case in which letters are printed is determined as
  follows:

    When READ-CASE is :UPCASE, upper-case letters are printed in the
    case specified by *PRINT-CASE*.

    When READ-CASE is :DOWNCASE, lower-case letters are printed in
    the case specified by *PRINT-CASE*.

    When READ-CASE is :PRESERVE, letters are printed in their own
    case.

    When READ-CASE is :INVERT, the case of all letters is inverted.

  (The behavior when *PRINT-CASE* is :CAPITALIZE is like :UPCASE for
  the first character and :DOWNCASE for the rest.)

  The rules for escaping letters are also affected by the READTABLE-CASE.
  If *PRINT-ESCAPE* is true, letters are escaped as follows:

    When READ-CASE is :UPCASE, all lower-case letters must be escaped.
    When READ-CASE is :DOWNCASE, all upper-case letters must be escaped.
    Otherwise, no letters need be escaped.

Proposal (READ-CASE-SENSITIVITY:READTABLE-FUNCTION)

  Define a new settable function (READTABLE-CHARACTER-TRANSLATION
  <readtable>) to control the reader's interpretation of unescaped
  constituent characters.  The value may be any function of type
  (FUNCTION (CHARACTER) CHARACTER).  Where the reader now converts
  such characters to upper case it should instead call the function
  that is the value of READTABLE-CHARACTER-TRANSLATION for the current
  readtable.  (See CLtL, page 337, step 8, point 1.)

  COPY-READTABLE copies the setting of READTABLE-CHARACTER-TRANSLATION.
  The value for the standard readtable is CHAR-UPCASE.

  The READTABLE-CHARACTER-TRANSLATION of a readtable also has
  significance when printing.  The reader recognizes certain functions
  which control the reader's interpretation of case and alters its
  behavior accordingly.  This behavior is given by the following
  correspondence between functions and the keywords described above.
  [This is just to avoid repeating a lot of text.]

    function           keyword
    CHAR-UPCASE        :UPCASE
    CHAR-DOWNCASE      :DOWNCASE
    IDENTITY           :PRESERVE
    CHAR-INVERT-CASE   :INVERT

  The function can be given either as a symbol or as one of the values
  #'CHAR-UPCASE, #'CHAR-DOWNCASE, #'IDENTITY, #'CHAR-INVERT-CASE.

  If the READTABLE-CHARACTER-TRANSLATION is not one of the functions
  listed above, letters are always printed in their own case (in
  particular, *PRINT-CASE* has no effect), and all characters in
  symbol names are escaped if *PRINT-ESCAPE* is true.

  Define a new function CHAR-INVERT-CASE of type (FUNCTION (CHARACTER)
  CHARACTER) analogous to CHAR-UPCASE and CHAR-DOWNCASE.  It attempts
  to convert its argument to upper-case if the argument is lower-case
  and to lower-case if the argument is upper-case.

Rationale:

  There are a number of different ways to achieve case-sensitivity.
  These proposals are fairly simple but provide all of the
  functionality that one could reasonably expect.

  By using a property of the readtable, we avoid introducing a new
  special variable.  Any code that wishes to control all of the
  reader's parameters already takes *READTABLE* into account.  A new
  special variable would require such code to change.

  :DOWNCASE is included for symmetry with :UPCASE.  :INVERT is
  included so that case conventions could be used in Common Lisp code
  without requiring that the names symbols in the "LISP" package be
  written in upper case.  (Opinions vary as to whether is is advisable
  to use such conventions, but this proposal leaves that choice to the
  user.)

  In order to avoid complex interactions between the case setting of
  the readtable and *PRINT-CASE*, this proposal specifies a
  significance for *PRINT-CASE* only when the case setting is :UPCASE
  or :DOWNCASE.  The meaning of *PRINT-CASE* when the readtable
  setting is :DOWNCASE was chosen for its simplicity and for symmetry
  with :UPCASE while still being useful.

Test Case:

  ;; keyword version
  (let ((rt (copy-readtable nil)))
    (mapcar
      #'(lambda (case)
          (setf (readtable-case rt) case)
          (read-from-string "Zebra"))
      '(:upcase :downcase :preserve :invert)))

    => (ZEBRA |zebra| |Zebra| |zEBRA|) ;as printed with the standard
                                       ;readtable and *print-case* :upcase

Current Practice:

  While there may not be any current implementation that supports
  exactly this proposal, several implementations provide some means
  for changing case sensitivity.

  Franz Inc's ExCL has a function, EXCL:SET-CASE-MODE, that sets both
  the "preferred case" (the case of characters in the print names of
  standard symbols such as CAR) and whether or not the reader is case-
  sensitive.

  In Symbolics Common Lisp, the function SET-CHARACTER-TRANSLATION
  can be used to make the translation of a letter be that same letter,
  thus achieving case-sensitivity.

  Xerox Medley has a function for setting a readtable flag that
  determines case sensitivity.

Cost to Implementors:

  Fairly small.  The reader will be slightly slower and readtables
  will be slightly more complex.

Cost to Users:

  Slight.  Programmers must already take into account the possibility
  that *READTABLE* will be a non-standard readtable.  Case-sensitivity
  is no worse than character macros in this respect.

Cost of Non-Adoption:

  Applications that want to read mixed-case expressions will not
  be able to use the Common Lisp reader to do so (except, perhaps,
  by tortuous use of read macros).

  Programming styles that rely on case distinctions (without escape
  characters) will be effectively impossible in Common Lisp.

Benefits:

  Applications will be able to read mixed-case expressions.

  Programmers will be able to make use of case distinctions.

Aesthetics:

  For the proposals: 

    The language will have greater symmetry, because it will be
    possible to control the treatment of case on both input and output
    instead of only on output (as is now the case).

    The language will look less old-fashioned.

  Against the proposals:
  
    It is, perhaps, inconsistent to control case-sensitivity by a
    readtable operation when other aspects of the reader, such as the
    input base and the default float format (not to mention the
    package), are controlled by special variables.  However, it can be
    argued that character-level syntax is determined chiefly by the
    readtable.  Case-sensitivity can be seen as analogous to character
    macros in this respect.

  Keywords vs function

    The keyword proposal is somewhat simpler and, by being less
    powerful, avoids suggesting the possibility of more general
    character translation (for every charcater, say, rather than
    just for unescaped constituents).

    The function proposal is perhaps more elegant.

Discussion:

  Dalton supports both proposals but slightly prefers READTABLE-KEYWORDS.

  Version 1 of the proposal suggested a new global variable rather
  than a property of the readtable.  Pitman was strongly opposed to
  that proposal and gave convincing arguments that it should be
  dropped.  Gray suggested that the readtable property should be a
  function.

  It has been remarked that :INVERT produces somewhat strange
  results.  For example, Zebra reads as zEBRA.  It was suggested
  that inversion should happen only if the entire token was single-
  case.

  However, READ has to take escape characters into account (so that,
  for example, |zebra| always reads as zebra), and then it is
  difficult to know what rules to apply to the entire token.
  Moreover, the description of READ in CLtL does not provide a
  convenient place to insert processing of that sort (by the time
  the full token is considered, the escape characters have been
  forgotten).