[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Issue: READ-CASE-SENSITIVITY (Version 4)



Maybe this is OK.  If anyone has objections, I'd really like to hear
them before everyone goes off to the meeting.  (Um, I know it's rather
late.)

Issue:        READ-CASE-SENSITIVITY

Forum:	      Cleanup

References:   CLtL p 334 ff: What the Read Function Accepts,
                especially p 337, step 8, point 1.
              CLtL p 360 ff: The Readtable
              COPY-READTABLE (CLtL, p 361)
              *PRINT-CASE* (CLtL, p 372)

Category:     ADDITION/CHANGE

Edit history: Version 1, 15-Feb-89, by Dalton
              Version 2, 23-Mar-89, by Dalton,
                (completely new proposal after comments from
                 Pitman, Gray, Masinter, and R.Tobin@uk.ac.ed)
              Version 3, 16-Jun-89, by Dalton
                (very minor changes in presentation
                 and some additions to the discussion)
              Version 4, 22-Jun-89, by Dalton
                (removal of the FUNCTION proposal and a different
                 specification for :INVERT after discussion with Moon)

Problem Description:

  The Common Lisp reader always converts unescaped constituent
  characters to upper case.  (See CLtL, p 337, step 8, point 1.)
  This behavior is not always desirable.

  1.  Lisp applications often use the Lisp reader to read their data.
  This is often significantly easier than writing input routines
  from scratch, especially if the input can be structured as lists.
  However, certain applications want to make use of case distinctions,
  and Common Lisp makes this unreasonably difficult.  (You must define
  every letter as a read macro and have the macro function read the
  rest of the symbol, or else you must write a reader from scratch.)

  2.  Some programming languages distinguish between upper and lower
  case in identifiers, and useful conventions are often built around
  such distinctions.  For example, in C, constants are often written
  in upper case and variables in lower.  In Mesa(?) and Smalltalk(?),
  a capital letter is used to indicate the beginning of a new word
  in identifiers made up of several words.  In Edinburgh Prolog,
  variables begin with upper-case letters and constant symbols do
  not.  The case-insensitivity of the Common Lisp reader makes
  it difficult to use conventions of this sort.

Proposal (READ-CASE-SENSITIVITY:READTABLE-KEYWORDS)

  Define a new settable function, (READTABLE-CASE <readtable>) to
  control the reader's interpretation of case.  The following values
  may be given: :UPCASE, :DOWNCASE, :PRESERVE, and :INVERT.

    When the value is :UPCASE, unescaped constituent characters
    are converted to upper-case, as specified by CLtL on page 337.

    When the value is :DOWNCASE, unescaped constituent characters
    are converted to lower-case.

    When the value is :PRESERVE, the case of all characters remains
    unchanged.

    When the value is :INVERT, then if if all of the unescaped letters
    in the extended token are of the same case, those (unescaped)
    letters are converted to the opposite case.

  COPY-READTABLE copies the setting of READTABLE-CASE.  The value of
  READTABLE-CASE for the standard readtable is :UPCASE.

  The READTABLE-CASE of a readtable also has significance when
  printing.  The case in which letters are printed is determined as
  follows:

    When READTABLE-CASE is :UPCASE, upper-case letters are printed
    in the case specified by *PRINT-CASE*, and lower-case letters
    are printed in their own case.

    When READTABLE-CASE is :DOWNCASE, lower-case letters are printed
    in the case specified by *PRINT-CASE*, and upper-case letters
    are printed in their own case.

    When READTABLE-CASE is :PRESERVE, all letters are printed in their
    own case.

    When READTABLE-CASE is :INVERT, the case of all letters in single-
    case symbol names is inverted.  Mixed-case symbol names are printed
    as-is.

  (The behavior when *PRINT-CASE* is :CAPITALIZE is like :UPCASE for
  the first character and :DOWNCASE for the rest.)

  The rules for escaping letters in symbol names are also affected by
  the READTABLE-CASE.  If *PRINT-ESCAPE* is true, letters are escaped
  as follows:

    When READTABLE-CASE is :UPCASE, all lower-case letters must be
    escaped.

    When READTABLE-CASE is :DOWNCASE, all upper-case letters must be
    escaped.

    When READTABLE-CASE is :PRESERVE, no letters need be escaped.

    When READTABLE-CASE is :INVERT, all letters in all single-case
    symbol names must be escaped.
    
Rationale:

  There are a number of different ways to achieve case-sensitivity.
  This proposal is fairly simple but provides all of the functionality
  that one could reasonably expect.

  By using a property of the readtable, we avoid introducing a new
  special variable.  Any code that wishes to control all of the
  reader's parameters already takes *READTABLE* into account.  A new
  special variable would require such code to change.

  :DOWNCASE is included for symmetry with :UPCASE.  

  :INVERT is included so that case conventions can be used in Common
  Lisp code without requiring that the names of symbols in the "LISP"
  package be written in upper case.  (Opinions vary as to whether is
  is advisable to use such conventions, but this proposal leaves that
  choice to the user.)

  :INVERT has an effect only for single-case names so that mixed-
  case names can be interpreted in a more straightforward way.

  In order to avoid complex interactions between the case setting of
  the readtable and *PRINT-CASE*, this proposal specifies a
  significance for *PRINT-CASE* only when the case setting is :UPCASE
  or :DOWNCASE.  The meaning of *PRINT-CASE* when the readtable
  setting is :DOWNCASE was chosen for its simplicity and for symmetry
  with :UPCASE while still being useful.

Test Case:

   (let ((rt (copy-readtable nil)))
    (mapcar
      #'(lambda (case)
          (setf (readtable-case rt) case)
          (read-from-string "Zebra"))
      '(:upcase :downcase :preserve :invert)))

    => (ZEBRA |zebra| |Zebra| |zEBRA|) ;as printed with the standard
                                       ;readtable and *print-case* :upcase

Current Practice:

  While there may not be any current implementation that supports
  exactly this proposal, several implementations provide some means
  for changing case sensitivity.

  Franz Inc's ExCL has a function, EXCL:SET-CASE-MODE, that sets both
  the "preferred case" (the case of characters in the print names of
  standard symbols such as CAR) and whether or not the reader is case-
  sensitive.

  In Symbolics Common Lisp, the function SET-CHARACTER-TRANSLATION
  can be used to make the translation of a letter be that same letter,
  thus achieving case-sensitivity.

  Xerox Medley has a function for setting a readtable flag that
  determines case sensitivity.

Cost to Implementors:

  Fairly small.  The reader will be slightly slower and readtables
  will be slightly more complex.

Cost to Users:

  Slight.  Programmers must already take into account the possibility
  that *READTABLE* will be a non-standard readtable.  Case-sensitivity
  is no worse than character macros in this respect.

Cost of Non-Adoption:

  Applications that want to read mixed-case expressions will not
  be able to use the Common Lisp reader to do so (except, perhaps,
  by tortuous use of read macros).

  Programming styles that rely on case distinctions (without escape
  characters) will effectively be impossible in Common Lisp.

Benefits:

  Applications will be able to read mixed-case expressions.

  Programmers will be able to make use of case distinctions.

Aesthetics:

  For the proposal: 

    The language will have greater symmetry, because it will be
    possible to control the treatment of case on both input and output
    instead of only on output (as is now the case).

    The language will look less old-fashioned.

  Against the proposal:
  
    It is, perhaps, inconsistent to control case-sensitivity by a
    readtable operation when other aspects of the reader, such as the
    input base and the default float format (not to mention the
    package), are controlled by special variables.  However, it can be
    argued that character-level syntax is determined chiefly by the
    readtable.  Case-sensitivity can be seen as analogous to character
    macros in this respect.

Discussion:

  Dalton supports the proposal READTABLE-KEYWORDS.

  Version 1 of the proposal suggested a new global variable rather
  than a property of the readtable.  Pitman was strongly opposed to
  that proposal and gave convincing arguments that it should be
  dropped.  Gray suggested that the readtable property should be a
  function.  Versions 2 and 3 included a FUNCTION proposal as well
  as the KEYWORD one.  But at the March 1989 X3J13 meeting it was
  felt that there should be only a single proposal and, since
  opinion seemed to favor the KEYWORD proposal, the FUNCTION
  proposal was dropped.

  In earlier versions of the proposal, :INVERT worked a letter at
  a time (rather than operating on extended tokens) so that, for
  example, Zebra read as zEBRA.  However, the purpose of :INVERT
  is to let the programmer get the standard internal case (ie,
  upper case) by writing lower case rather than upper.  This
  matters when referring to single-case symbols such as those
  in the LISP package.  But, in most cases, mixed-case identifiers
  will already have the right case.  For example, one would use
  TheNextWindow to get TheNextWindow, not tHEnEXTwINDOW.