[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Issue: READ-CASE-SENSITIVITY (Version 3)
- To: email@example.com
- Subject: Issue: READ-CASE-SENSITIVITY (Version 3)
- From: Jeff Dalton <jeff%aiai.edinburgh.ac.uk@NSFnet-Relay.AC.UK>
- Date: Fri, 16 Jun 89 19:35:44 BST
I am not going to be able to come to the meeting but would
still like to see this issue considered. I wasn't present
for the discussion at the last meeting, but my understanding
is that the committee wanted to see the issue again with
only a single proposal. Unfortunately, I don't know which
proposal (keyword or function) most people prefer.
It has also been remarked that :INVERT is somewhat strange in that it
would have Zebra read as zEBRA; and it was suggested that inversion
should happen only if the entire name were single-case.
Unfortunately, the processing has to happen a character at a time,
because READ has to do it only for characters that are not escaped.
For example, |zebra| should always read as zebra.
Apart from that, very little has been said about this proposal.
I hope that anyone who does have objections will be able to raise
them by mail before the meeting.
References: CLtL p 334 ff: What the Read Function Accepts,
especially p 337, step 8, point 1.
CLtL p 360 ff: The Readtable
COPY-READTABLE (CLtL, p 361)
*PRINT-CASE* (CLtL, p 372)
Edit history: Version 1, 15-Feb-89, by Dalton
Version 2, 23-Mar-89, by Dalton,
(completely new proposal after comments from
Pitman, Gray, Masinter, and R.Tobin@uk.ac.ed)
Version 3, 16-Jun-89, by Dalton
(very minor changes in presentation
and some additions to the discussion)
The Common Lisp reader always converts unescaped constituent
characters to upper case. (See CLtL, p 337, step 8, point 1.)
This behavior is not always desirable.
1. Lisp applications often use the Lisp reader to read their data.
This is often significantly easier than writing input routines
from scratch, especially if the input can be structured as lists.
However, certain applications want to make use of case distinctions,
and Common Lisp makes this unreasonably difficult. (You must define
every letter as a read macro and have the macro function read the
rest of the symbol, or else you must write a reader from scratch.)
2. Some programming languages distinguish between upper and lower
case in identifiers, and useful conventions are often built around
such distinctions. For example, in C, constants are often written
in upper case and variables in lower. In Mesa(?) and Smalltalk(?),
a capital letter is used to indicate the beginning of a new word
in identifiers made up of several words. In Edinburgh Prolog,
variables begin with upper-case letters and constant symbols do
not. The case-insensitivity of the Common Lisp reader makes
it difficult to use conventions of this sort.
Define a new settable function, (READTABLE-CASE <readtable>) to
control the reader's interpretation of case. The following values
may be given:
:UPCASE -- convert unescaped characters to upper-case, as now.
:DOWNCASE -- convert unescaped characters to lower-case.
:PRESERVE -- don't convert, leaving lower-case letters in lower
case and upper-case characters in upper case.
:INVERT -- convert lower-case to upper and upper-case to lower.
COPY-READTABLE copies the setting of READTABLE-CASE. The value of
READTABLE-CASE for the standard readtable is :UPCASE.
The READTABLE-CASE of a readtable also has significance when
printing. The case in which letters are printed is determined as
When READ-CASE is :UPCASE, upper-case letters are printed in the
case specified by *PRINT-CASE*.
When READ-CASE is :DOWNCASE, lower-case letters are printed in
the case specified by *PRINT-CASE*.
When READ-CASE is :PRESERVE, letters are printed in their own
When READ-CASE is :INVERT, the case of all letters is inverted.
(The behavior when *PRINT-CASE* is :CAPITALIZE is like :UPCASE for
the first character and :DOWNCASE for the rest.)
The rules for escaping letters are also affected by the READTABLE-CASE.
If *PRINT-ESCAPE* is true, letters are escaped as follows:
When READ-CASE is :UPCASE, all lower-case letters must be escaped.
When READ-CASE is :DOWNCASE, all upper-case letters must be escaped.
Otherwise, no letters need be escaped.
Define a new settable function (READTABLE-CHARACTER-TRANSLATION
<readtable>) to control the reader's interpretation of unescaped
constituent characters. The value may be any function of type
(FUNCTION (CHARACTER) CHARACTER). Where the reader now converts
such characters to upper case it should instead call the function
that is the value of READTABLE-CHARACTER-TRANSLATION for the current
readtable. (See CLtL, page 337, step 8, point 1.)
COPY-READTABLE copies the setting of READTABLE-CHARACTER-TRANSLATION.
The value for the standard readtable is CHAR-UPCASE.
The READTABLE-CHARACTER-TRANSLATION of a readtable also has
significance when printing. The reader recognizes certain functions
which control the reader's interpretation of case and alters its
behavior accordingly. This behavior is given by the following
correspondence between functions and the keywords described above.
[This is just to avoid repeating a lot of text.]
The function can be given either as a symbol or as one of the values
#'CHAR-UPCASE, #'CHAR-DOWNCASE, #'IDENTITY, #'CHAR-INVERT-CASE.
If the READTABLE-CHARACTER-TRANSLATION is not one of the functions
listed above, letters are always printed in their own case (in
particular, *PRINT-CASE* has no effect), and all characters in
symbol names are escaped if *PRINT-ESCAPE* is true.
Define a new function CHAR-INVERT-CASE of type (FUNCTION (CHARACTER)
CHARACTER) analogous to CHAR-UPCASE and CHAR-DOWNCASE. It attempts
to convert its argument to upper-case if the argument is lower-case
and to lower-case if the argument is upper-case.
There are a number of different ways to achieve case-sensitivity.
These proposals are fairly simple but provide all of the
functionality that one could reasonably expect.
By using a property of the readtable, we avoid introducing a new
special variable. Any code that wishes to control all of the
reader's parameters already takes *READTABLE* into account. A new
special variable would require such code to change.
:DOWNCASE is included for symmetry with :UPCASE. :INVERT is
included so that case conventions could be used in Common Lisp code
without requiring that the names symbols in the "LISP" package be
written in upper case. (Opinions vary as to whether is is advisable
to use such conventions, but this proposal leaves that choice to the
In order to avoid complex interactions between the case setting of
the readtable and *PRINT-CASE*, this proposal specifies a
significance for *PRINT-CASE* only when the case setting is :UPCASE
or :DOWNCASE. The meaning of *PRINT-CASE* when the readtable
setting is :DOWNCASE was chosen for its simplicity and for symmetry
with :UPCASE while still being useful.
;; keyword version
(let ((rt (copy-readtable nil)))
(setf (readtable-case rt) case)
'(:upcase :downcase :preserve :invert)))
=> (ZEBRA |zebra| |Zebra| |zEBRA|) ;as printed with the standard
;readtable and *print-case* :upcase
While there may not be any current implementation that supports
exactly this proposal, several implementations provide some means
for changing case sensitivity.
Franz Inc's ExCL has a function, EXCL:SET-CASE-MODE, that sets both
the "preferred case" (the case of characters in the print names of
standard symbols such as CAR) and whether or not the reader is case-
In Symbolics Common Lisp, the function SET-CHARACTER-TRANSLATION
can be used to make the translation of a letter be that same letter,
thus achieving case-sensitivity.
Xerox Medley has a function for setting a readtable flag that
determines case sensitivity.
Cost to Implementors:
Fairly small. The reader will be slightly slower and readtables
will be slightly more complex.
Cost to Users:
Slight. Programmers must already take into account the possibility
that *READTABLE* will be a non-standard readtable. Case-sensitivity
is no worse than character macros in this respect.
Cost of Non-Adoption:
Applications that want to read mixed-case expressions will not
be able to use the Common Lisp reader to do so (except, perhaps,
by tortuous use of read macros).
Programming styles that rely on case distinctions (without escape
characters) will be effectively impossible in Common Lisp.
Applications will be able to read mixed-case expressions.
Programmers will be able to make use of case distinctions.
For the proposals:
The language will have greater symmetry, because it will be
possible to control the treatment of case on both input and output
instead of only on output (as is now the case).
The language will look less old-fashioned.
Against the proposals:
It is, perhaps, inconsistent to control case-sensitivity by a
readtable operation when other aspects of the reader, such as the
input base and the default float format (not to mention the
package), are controlled by special variables. However, it can be
argued that character-level syntax is determined chiefly by the
readtable. Case-sensitivity can be seen as analogous to character
macros in this respect.
Keywords vs function
The keyword proposal is somewhat simpler and, by being less
powerful, avoids suggesting the possibility of more general
character translation (for every charcater, say, rather than
just for unescaped constituents).
The function proposal is perhaps more elegant.
Dalton supports both proposals but slightly prefers READTABLE-KEYWORDS.
Version 1 of the proposal suggested a new global variable rather
than a property of the readtable. Pitman was strongly opposed to
that proposal and gave convincing arguments that it should be
dropped. Gray suggested that the readtable property should be a
It has been remarked that :INVERT produces somewhat strange
results. For example, Zebra reads as zEBRA. It was suggested
that inversion should happen only if the entire token was single-
However, READ has to take escape characters into account (so that,
for example, |zebra| always reads as zebra), and then it is
difficult to know what rules to apply to the entire token.
Moreover, the description of READ in CLtL does not provide a
convenient place to insert processing of that sort (by the time
the full token is considered, the escape characters have been