[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: cs proposal
- To: Thom Linden <baggins@ibm.com>
- Subject: Re: cs proposal
- From: masinter.pa@Xerox.COM
- Date: 25 Oct 88 11:03 PDT
- Cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>, yuasa%tutics.tut.junet%utokyo-relay.CSNet@relay.cs.net, rpk@wheaties.ai.mit.edu, brent%hpfcpsb@hplabs.hp.com, sandra%defun@cs.utah.edu, vanroggen%atig.dec@decwrl.dec.com, moon@stony-brook.scrc.symbolics.com
- Cc: cl-cleanup@sail.stanford.edu
- In-reply-to: Thom Linden <baggins@ibm.com>'s message of Tue, 25 Oct 88 09:51:42 PDT
Thom:
The proposal did not ever say explicitly, and I feel strongly that it
should, that in fact Common Lisp *requires* absolutely no changes in order
to support extended character sets. The language as specified in CLtL is
entirely adequate to allow handling of multiple, international character
sets. The implementation of Xerox Common Lisp, now available on Xerox 1100
series workstations and Sun 3 and Sun 4 workstations, is an existance
proof.
There is a price, however, that implementations must pay in order to use
CltL unchanged. The price can either be in terms of space -- reserving
enough bits per character in a string -- or in terms of speed, in
implementations in which not all strings are displacable. (Briefly, the
implementation technique is to allocate a smaller number of bits per
character than the maximum in most strings, but allow for strings to be
displaced.)
Thus, the only changes to the language that can be justified from the point
of view of allowing support of International Character Sets are those that
have an arguably more efficient implementation. The change to the type
hierarchy, the modification of the STRING type from an abbreviation of
(VECTOR STRING-CHAR) to an indefinite union of types, and the various
changes associated with that to STRINGP etc. should be justified by some
explicit rationale as to the efficiency of the implementations under that
regime.
The Character Proposal includes several other enhancements and
modifications which are probably good ideas, but which require separate
discussion and justification. Removing CHAR-FONT is a good idea, because
the feature is not used. Removing CHAR-BITS is probably a good idea,
because the feature is not used widely, and was (perhaps) based on a design
decision which confounded keystrokes with characters and which allowed for
"characters" which could not be held in "strings". These changes can and
should be justified completely independently of any notion of
"international character set handling".
Extending the "CHARACTER" type specifier to have a list form is probably a
good idea, although it does not conform to any current practice, as it
allows a single existing mechanism to provide what is otherwise an
overambitious proliferation of character-predicate functions in an easily
extensible manner. (Implementations that do not support hiragana can easily
support (typep x '(character :hiragana)) == NIL.) This is related to the
support of international character sets, but hardly required. Certainly
this does not go far enough to allow portable programs to be written. For
one small example, the proposal is silent about what char-upcase and
char-downcase do on greek or cyrillic characters, or accented characters in
european alphabets. Would not portable language manipulation programs would
need portable definitions of these? Is there some reason in principle for
not agreeing on the operation of CHAR-UPCASE? Are hiragana characters
ALPHA-CHAR-P? Etc.
The discussion of the proposal frequently confounds two separate
distinctions, of "backward compatibility" and "portability". Our primary
goal is to allow "portable" programs -- programs that, if written in the
standard language, will run unchanged in all implementations that support
the standard language. We try to achieve that while also supporting
"backward compatibility" -- programs that run in current implementations of
Common Lisp should continue to work correctly unchanged. I fear that the
proposal, in the name of efficiency and backward compatibility, damages
the portability of the resulting language, because it allows programs to
rely on implementation-dependent details of the nature of strings. It
damages backward compatibility, because valid programs that manipulated
strings will no longer be correct. And it does not make a convincing
argument that it has actually solved the problem of *efficient*
international character set handling.
I think you've done an admirable job of establishing the scope and extent
of possible changes to Common Lisp in the area of character handling.
However, I would like to see the proposal split up into separate "issues"
which each have their own pros and cons. I would like to see this happen so
that the cleanup committee doesn't have to clean up after the character
committee is done.
Sincerely,
Larry Masinter