[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

issues for international languages



            What characters are alpha-char-p? graphic-char-p?
            What does char-upcase do for non-Roman characters?

  As these are all character set dependent, it suggests the need
for a mechanism to define them (as well as string-downcase,
string-capitalize,upper-case-p, etc.) for each unique character
set in use.  For example, I believe it is correct that in Chinese
(Hanzi) there are no alpha characters (ie. it has no alphabet).


            Should we (can we) define some procedures that preserve...


  This would seem to rest primarily (entirely?) in the string
comparision function definition.  Currently, CltL states (p301) that
"A string a is less than a string b if in the first position in
which they differ the character of a is less than the corresponding
character of b according to the function char<,  or..."

In general, this is not sufficient:

An example from German, the single character double-s is sorted as ss.

An example from Czech., the combination ch is normally processed as
  two characters except in sorting where they are considered one
  character falling after h.

Japanese kanji has multiple sorting orders: telephone, dictionary,
  strokes and radicals.  So does the FRG.


Thus,  either the string comp definition changes to say something
less stringent or as you suggest, new comparision functions are
needed.  Are there any examples of handling this in other
programming languages?

Regards,
  Thom