[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Issue: READ-CASE-SENSITIVITY (Version 4)
- To: "David A. Moon" <Moon@stony-brook.scrc.symbolics.com>
- Subject: Issue: READ-CASE-SENSITIVITY (Version 4)
- From: Jeff Dalton <jeff%aiai.edinburgh.ac.uk@NSFnet-Relay.AC.UK>
- Date: Thu, 22 Jun 89 19:32:17 BST
- Cc: cl-cleanup@sail.stanford.edu
- In-reply-to: David A. Moon's message of Wed, 21 Jun 89 16:54 EDT
Maybe this is OK. If anyone has objections, I'd really like to hear
them before everyone goes off to the meeting. (Um, I know it's rather
late.)
Issue: READ-CASE-SENSITIVITY
Forum: Cleanup
References: CLtL p 334 ff: What the Read Function Accepts,
especially p 337, step 8, point 1.
CLtL p 360 ff: The Readtable
COPY-READTABLE (CLtL, p 361)
*PRINT-CASE* (CLtL, p 372)
Category: ADDITION/CHANGE
Edit history: Version 1, 15-Feb-89, by Dalton
Version 2, 23-Mar-89, by Dalton,
(completely new proposal after comments from
Pitman, Gray, Masinter, and R.Tobin@uk.ac.ed)
Version 3, 16-Jun-89, by Dalton
(very minor changes in presentation
and some additions to the discussion)
Version 4, 22-Jun-89, by Dalton
(removal of the FUNCTION proposal and a different
specification for :INVERT after discussion with Moon)
Problem Description:
The Common Lisp reader always converts unescaped constituent
characters to upper case. (See CLtL, p 337, step 8, point 1.)
This behavior is not always desirable.
1. Lisp applications often use the Lisp reader to read their data.
This is often significantly easier than writing input routines
from scratch, especially if the input can be structured as lists.
However, certain applications want to make use of case distinctions,
and Common Lisp makes this unreasonably difficult. (You must define
every letter as a read macro and have the macro function read the
rest of the symbol, or else you must write a reader from scratch.)
2. Some programming languages distinguish between upper and lower
case in identifiers, and useful conventions are often built around
such distinctions. For example, in C, constants are often written
in upper case and variables in lower. In Mesa(?) and Smalltalk(?),
a capital letter is used to indicate the beginning of a new word
in identifiers made up of several words. In Edinburgh Prolog,
variables begin with upper-case letters and constant symbols do
not. The case-insensitivity of the Common Lisp reader makes
it difficult to use conventions of this sort.
Proposal (READ-CASE-SENSITIVITY:READTABLE-KEYWORDS)
Define a new settable function, (READTABLE-CASE <readtable>) to
control the reader's interpretation of case. The following values
may be given: :UPCASE, :DOWNCASE, :PRESERVE, and :INVERT.
When the value is :UPCASE, unescaped constituent characters
are converted to upper-case, as specified by CLtL on page 337.
When the value is :DOWNCASE, unescaped constituent characters
are converted to lower-case.
When the value is :PRESERVE, the case of all characters remains
unchanged.
When the value is :INVERT, then if if all of the unescaped letters
in the extended token are of the same case, those (unescaped)
letters are converted to the opposite case.
COPY-READTABLE copies the setting of READTABLE-CASE. The value of
READTABLE-CASE for the standard readtable is :UPCASE.
The READTABLE-CASE of a readtable also has significance when
printing. The case in which letters are printed is determined as
follows:
When READTABLE-CASE is :UPCASE, upper-case letters are printed
in the case specified by *PRINT-CASE*, and lower-case letters
are printed in their own case.
When READTABLE-CASE is :DOWNCASE, lower-case letters are printed
in the case specified by *PRINT-CASE*, and upper-case letters
are printed in their own case.
When READTABLE-CASE is :PRESERVE, all letters are printed in their
own case.
When READTABLE-CASE is :INVERT, the case of all letters in single-
case symbol names is inverted. Mixed-case symbol names are printed
as-is.
(The behavior when *PRINT-CASE* is :CAPITALIZE is like :UPCASE for
the first character and :DOWNCASE for the rest.)
The rules for escaping letters in symbol names are also affected by
the READTABLE-CASE. If *PRINT-ESCAPE* is true, letters are escaped
as follows:
When READTABLE-CASE is :UPCASE, all lower-case letters must be
escaped.
When READTABLE-CASE is :DOWNCASE, all upper-case letters must be
escaped.
When READTABLE-CASE is :PRESERVE, no letters need be escaped.
When READTABLE-CASE is :INVERT, all letters in all single-case
symbol names must be escaped.
Rationale:
There are a number of different ways to achieve case-sensitivity.
This proposal is fairly simple but provides all of the functionality
that one could reasonably expect.
By using a property of the readtable, we avoid introducing a new
special variable. Any code that wishes to control all of the
reader's parameters already takes *READTABLE* into account. A new
special variable would require such code to change.
:DOWNCASE is included for symmetry with :UPCASE.
:INVERT is included so that case conventions can be used in Common
Lisp code without requiring that the names of symbols in the "LISP"
package be written in upper case. (Opinions vary as to whether is
is advisable to use such conventions, but this proposal leaves that
choice to the user.)
:INVERT has an effect only for single-case names so that mixed-
case names can be interpreted in a more straightforward way.
In order to avoid complex interactions between the case setting of
the readtable and *PRINT-CASE*, this proposal specifies a
significance for *PRINT-CASE* only when the case setting is :UPCASE
or :DOWNCASE. The meaning of *PRINT-CASE* when the readtable
setting is :DOWNCASE was chosen for its simplicity and for symmetry
with :UPCASE while still being useful.
Test Case:
(let ((rt (copy-readtable nil)))
(mapcar
#'(lambda (case)
(setf (readtable-case rt) case)
(read-from-string "Zebra"))
'(:upcase :downcase :preserve :invert)))
=> (ZEBRA |zebra| |Zebra| |zEBRA|) ;as printed with the standard
;readtable and *print-case* :upcase
Current Practice:
While there may not be any current implementation that supports
exactly this proposal, several implementations provide some means
for changing case sensitivity.
Franz Inc's ExCL has a function, EXCL:SET-CASE-MODE, that sets both
the "preferred case" (the case of characters in the print names of
standard symbols such as CAR) and whether or not the reader is case-
sensitive.
In Symbolics Common Lisp, the function SET-CHARACTER-TRANSLATION
can be used to make the translation of a letter be that same letter,
thus achieving case-sensitivity.
Xerox Medley has a function for setting a readtable flag that
determines case sensitivity.
Cost to Implementors:
Fairly small. The reader will be slightly slower and readtables
will be slightly more complex.
Cost to Users:
Slight. Programmers must already take into account the possibility
that *READTABLE* will be a non-standard readtable. Case-sensitivity
is no worse than character macros in this respect.
Cost of Non-Adoption:
Applications that want to read mixed-case expressions will not
be able to use the Common Lisp reader to do so (except, perhaps,
by tortuous use of read macros).
Programming styles that rely on case distinctions (without escape
characters) will effectively be impossible in Common Lisp.
Benefits:
Applications will be able to read mixed-case expressions.
Programmers will be able to make use of case distinctions.
Aesthetics:
For the proposal:
The language will have greater symmetry, because it will be
possible to control the treatment of case on both input and output
instead of only on output (as is now the case).
The language will look less old-fashioned.
Against the proposal:
It is, perhaps, inconsistent to control case-sensitivity by a
readtable operation when other aspects of the reader, such as the
input base and the default float format (not to mention the
package), are controlled by special variables. However, it can be
argued that character-level syntax is determined chiefly by the
readtable. Case-sensitivity can be seen as analogous to character
macros in this respect.
Discussion:
Dalton supports the proposal READTABLE-KEYWORDS.
Version 1 of the proposal suggested a new global variable rather
than a property of the readtable. Pitman was strongly opposed to
that proposal and gave convincing arguments that it should be
dropped. Gray suggested that the readtable property should be a
function. Versions 2 and 3 included a FUNCTION proposal as well
as the KEYWORD one. But at the March 1989 X3J13 meeting it was
felt that there should be only a single proposal and, since
opinion seemed to favor the KEYWORD proposal, the FUNCTION
proposal was dropped.
In earlier versions of the proposal, :INVERT worked a letter at
a time (rather than operating on extended tokens) so that, for
example, Zebra read as zEBRA. However, the purpose of :INVERT
is to let the programmer get the standard internal case (ie,
upper case) by writing lower case rather than upper. This
matters when referring to single-case symbols such as those
in the LISP package. But, in most cases, mixed-case identifiers
will already have the right case. For example, one would use
TheNextWindow to get TheNextWindow, not tHEnEXTwINDOW.