[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Issue: READ-CASE-SENSITIVITY (Version 1)



It may be rather late for new issues, but I think this one should
still be considered.  The proposal below is fairly minimal.  More
far-reaching proposals, or ones that are better in other ways, are
also possible; but this seems a good place to start.

A longstanding problem in Common Lisp is that the reader always
converts unescaped characters in symbol names to upper case.
This makes a number of applications more difficult than they
should be and looks like an oversight (especially given the
existence of *PRINT-CASE*).

This problem is an easy target for critics of Common Lisp and
tends to imflame opinion.  Here are three comments I have seen
recently in the UK:

   [...] particular small lunacies (e.g. the printer can be parameterised
   to case fold everything to lower case, but the reader ALWAYS folds
   to upper case, which is a real nasty for people who would like
   to write their own code in case sensitive styles with Car, CAR and car
   baing different.  E.g. it means that using the Lisp reader to accept
   (as a cheap way of so doing) natural language stuff from the user
   necessarily loses capitalisation, which is STUPID.  I also happen to
   consider it ill-judged and a throw-back the TTY33s and punched cards
   to work in UPPER CASE FOR ALL LISP INTERNAL DATASTRUCTURES, SINCE I
   RATHER EXPECT MOST CURRENT PROGRAMMING STYLE IS BASED AROUND USE OF
   LOWER OR MIXED CASE.

   I also have a feeling the it is perpetuating the punched card era to
   have a so called modern language work entirely in upper case inside
   itself, and amazing to go to the trouble for case folding on both
   input and output to paper over this, and even more bizzare to make the
   output folding optional but not the input...)

   The single thing which is left which most upsets me is the bloody
   upper-case only reader - I *can't believe* that anyone thinks that's
   a good idea.

Of course, it is not just to answer such criticism that I propose
this change.

-----
Issue:        READ-CASE-SENSITIVITY
Forum:	      Cleanup
References:   CLtL p 334 ff: What the Read Function Accepts;
                especially p 337.
              *PRINT-CASE* (CLtL, p 372)
Category:     ADDITION/CHANGE
Edit history: 15-Feb-89, Version 1 by Dalton
Status:	      For Internal Discussion

Problem Description:

  The Common Lisp reader always converts unescaped constituent
  characters to upper case.  (See CLtL, p 337, step 8, point 1.)

Proposal (READ-CASE-SENSITIVITY:LIKE-PRINT-CASE)

  Add a new read parameter, *READ-CASE*, to control case
  sensitivity.  By analogy with *PRINT-CASE*, it may take the
  following values:

    :UPCASE   --  convert unescaped characters to upper-case, as now.
    :DOWNCASE --  don't convert, leaving lower-case letters in lower
                  case.

  This proposal does not provide a way to tell the reader to convert
  upper-case letters to lower-case.  This is consistent with *PRINT-
  CASE*, which does not provide a way to print lower-case letters in
  upper-case.

  Note that an isomorphic proposal that would not emphasise an
  analogy with *PRINT-CASE* would be to add a parameter called
  *READ-CASE-SENSITIVE*, taking the values T (to preserve case)
  or NIL (to convert to upper case).  Yet another proposal might
  be to add a function READ-PRESERVING-CASE (hee hee).

Rationale:

  There are several reasons for this proposal.

  1.  Lisp applications often use the Lisp reader to read their data.
  This is often significantly easier than writing input routines
  from scratch, especially if the input can be structured as lists.
  However, certain applications want to make use of case distinctions,
  and Common Lisp makes this unreasonably difficult.  (You must define
  every letter as a read macro and have the macro function read the
  rest of the symbol.)

  2.  Some programming languages distinguish between upper and lower
  case in identifiers, and useful conventions are often built around
  such distinctions.  For example, in C, constants are often written
  in upper case and variables in lower.  In Mesa(?) and Smalltalk(?),
  a capital letter is used to indicate the beginning of a new word
  in identifiers made up of several words.  In Edinburgh Prolog,
  variables begin with upper-case letters and constant symbols do
  not.  The case-insensitivity of the Common Lisp reader makes
  it difficult to use conventions of this sort.

  3.  Among Lisp dialects, Common Lisp gives an unusual degree of
  control over the case of output.  However, there is no control over
  the treatment of case on input.  This makes the language unbalanced.

  We live in a mixed-case world, and it should be possible to make use
  of case distinctions in Common Lisp.

Test Case:

  (let ((*read-case* :downcase))
    (read-from-string "Zebra"))

  = ZEBRA     ;under CLtL
  = |Zebra|   ;under this proposal

Current Practice:

  I do not know of any implementation that implements this proposal.

  Franz Inc's ExCL has (or at least had) a function, excl:set-case-mode,
  that set both the "preferred case" (the case of character in the print
  names of standard symbols such as CAR) and whether or not the reader
  was case-sensitive.

Cost to Implementors:

  Fairly small.

Cost to Users:

  None, this is a compatible change from the user's standpoint.

Cost of Non-Adoption:

  Applications that want to read mixed-case expressions will not
  be able to use the Common Lisp reader to do so (except, perhaps,
  by tortuous use of read macros).

  Programming styles that rely on case distinctions (without escape
  characters) will be impossible.

Benefits:

  Applications will be able to read mixed-case expressions.

  Programmers will be able to make use of case distinctions.

Aesthetics:

  For the proposal:
  The language will have greater symmetry.
  The language will look less old-fashioned.

  Against the proposal:
  The addition of another global parameter to control yet another
  aspect of I/O is inelegant and increases clutter.

  In favor of additional or more far-reaching changes:
  Anyone wishing to make use of case distinctons in Lisp programs
  will have to write the names of symbols in the "LISP" package in
  upper case.

Discussion:

  Larry Masinter suggested at one point that case sensitivity could
  be an aspect of read tables rather than a separate parameter.
  There may be several ways of doing this.  For example, it might
  be a property of each character or of the table as a whole.

  An interesting possibility would be to disguise the preferred
  internal case by defining a value for *READ-CASE* called :INVERT.
  If the value were :INVERT, mixed-case symbols would remain the same
  (or perhaps they would be inverted too) but all upper case input
  would specify a lower-case name internally, and vice versa.
  You may recall that something similar was suggested for pathnames.

  Dalton supports the proposal LIKE-PRINT-CASE.