[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Extending the address space of MIT Cscheme (long reply)



I hope the following clarifies the situation somewhat about the
technical/stylistic points you've raised:

    >    * How about some specific examples of design/engineering decisions
    >    that you consider flaws, and why?

    Gazillions of builtin types.  There are twice as many primitive types of
    objects as any other high-level language that I know of (3- and 4-element
    hunks? give me a break!).  Many could have been built as nonprimitives.
    See T for good efforts in that direction.

What's wrong with that?  It's nice (and ultimately more robust) to
allow the system to distinguish between various objects with 2, 3, or
more components.  For example, taken the CAR of a symbol (which in
CScheme has two components) will signal an error.  Similarly for other
kinds of objects.  Implementations on custom Lisp hardware often
allocate 5 or 6 tag bits so they can accomodate a large number of
primitive types for the same reason.

In the case of T, I suspect that they would do the same if they could
afford the tag bits, but their decision (which I'm not questioning or
criticizing) to use an object representation with low tag bits gives
them very few primitive types, so they had to reduce their number.

    Attempts to optimize the C code implementing a virtual machine.  If you use
    a virtual machine, you've already lost speedwise; doing complicated C hacks
    isn't going to recover much for you. (Presumably that's the reason for
    hundreds of C macros that could have been function calls.)

You are confusing a few things:

1) Using a virtual machine is a common technique these days to
implement interpreters.  We have made no claim that our interpreter
will run code as fast as code produced by native code compilers.  Thus
there is no attempt to make the CScheme interpreter run as fast as
anybody else's compiled code.  As far as interpreters go, it's not
great, but it's not bad in terms of speed, but provides much more
convenience and ease of debugging than any other interpreter I know.

2) Most of the hairy C and macrology arise not from using a virtual
machine, but from the fact that the PARTICULAR virtual machine we
chose cannot be conveniently written in C without paying an undue
penalty.

The CScheme interpreter is a recoding of an assembly language
interpreter written for the Motorola MC68000 which in turn started out
as an emulator for the Scheme 81 chip.  The virtual machine was coded
naturally in assembly language (and Scheme 81 microcode), but cannot
be coded as conveniently in C.  We wish we had a portable assembly
language which would allow us to do these things more cleanly and
efficiently, but unfortunately there is no such beast.  C comes
closest, but is a far cry from what we would like.  As it is, the
assembly language interpreter implementing the same virtual machine is
considerably cleaner and still runs somewhat faster on the same
hardware.

    Writing nonprimitives in C.  My eye happened to fall on list_to_string,
    which is about three times longer and more complicated in C than in Scheme.
    What earthly reason could there be for this?  I suppose it could be worse;
    KCL is an example.  On the other hand, even KCL doesn't put a Fast Fourier
    Transform and a regular expression matcher in its C "microcode"...

Again, you are ignoring the fact that CScheme is currently an
interpreter based system, rather than a compiler based system.  The
trade-offs are pretty different.  In the presence of an acceptable
native code compiler one can easily afford to write many utilities in
the language being compiled, but interpreters rarely provide enough
performance.  Many Basic interpreters have the same outlook: Most of
the utility procedures are provided as part of the implementation and
written in the implementation language rather than Basic.

Even in the presence of an acceptable native code compiler (which we
now have), there is no good reason to flush this code: The interpreter
can still be used without having to port the compiler (which is a much
harder task than bringing up the interpreter) and may give adequate
performance in educational settings (which was CScheme's only goal in
the first place).

As far as FFT and Regexp search:

- FFT is not part of the standard scheme "microcode".  It is part of a
special version of Scheme used in the introductory signals course at
MIT.  Since somebody had bothered to write it, we included it in the
release (although it is not loaded by default) so that other people
could use it if they wanted to or were curious.  It was written in C
because our interpreter could hardly compete with compiled C, but few
Lisp implementations (if any) can currently match the speed that can
be obtained from C or FORTRAN in numeric code.  What's wrong with
coding a commonly used procedure in a language which will make it more
efficient?  Please show me a Lisp which can compete with C in this
regard.  If you manage to do this, I strongly suspect that the number
of arcane declarations in the Lisp code will make it at least as
unreadable as C, so you're no better off.

- Regexp search: It's written in C because it has been copied almost
verbatim from the similar code in GNU Emacs, written in C.  Why bother
rewriting (and debugging) something which we can use directly with
little effort?  Note that this code is used only by Edwin, which has
not yet been released, so it's presence in the release is spurious,
and the file which contains it can be dropped from the set of loaded
files since no released code uses it (to my knowledge).  On the other
hand, why not make it available so people can use it or read it if
they are curious?

    References to apparent GC in all kinds of strange places.  When I followed
    them, the trail disappeared in a maze of macros.

Hmm.  Maybe the Lisps you are used to don't check whether there is
enough space to allocate storage before they go ahead and do it.  They
can't be very robust.  I'm also surprised that you complain about this
particular "maze".  It seems pretty straightforward to me.

    References to the compiler and Edwin, thus violating every principle of
    abstraction known to exist.

What?  Please explain, I don't understand what that means.

    Strange code concerning "MIT ASCII" (that's what it said) vs regular
    ASCII characters.  Given that CScheme is "portable", why does this get
    included in everybody's copy?

Please read section 13.5 of the "Common Lisp the Language" book.  MIT
ASCII is ASCII with control, meta, hyper, and super bits, and is
extremely useful when writing Emacs-like editors.  CScheme only
assumes the standard ASCII character set (and the assumptions are few,
so converting to EBCDIC is not hard and has been done in the past),
and builds MIT ASCII on top, thus portability is not compromised.

    Lack of documentation is an unforgivable omission, and is by far my
    biggest gripe about CScheme.  It shouldn't take two hours to figure
    out what kind of garbage collection is being done, or to figure out
    what the "danger bit" does.  I've worked with many programs on that
    scale during my 12 years in computing, and to be fair, most large
    programs are difficult to understand, even with documentation (TeX
    for instance).  This just means that *more* documentation is required,
    not less!  To put it another way, lesser minds can't be amazed by your
    cleverness if they don't even know what's going on...

The hard line:

CScheme was provided because other people asked for it, not because we
wanted to "spread the Gospel", or were willing to support it.  Quoting
 from the CScheme copyright notice:

    4. MIT has made no warrantee or representation that the operation of
    this software will be error-free, and MIT is under no obligation to
    provide any services, by way of maintenance, update, or otherwise.

This is not merely "legalese", we mean it.  As a consequence of this,
we feel under NO obligation to write documentation until WE need it or
have the time to do it.  Both of these conditions have been met
recently, so the process has started.  

On the other hand... The CScheme system is FAR from complete or
"finished".  We could have waited until it was finished (including
documentation) before making it available, but we decided to make it
available earlier in the hope that it would be useful, but with the
understanding that it would be incomplete.  Please don't flame because
we are not yet done.  You are being impatient.

A little CScheme history may explain some idiosyncrasies:

- CScheme was first written, as a toy, in the fall of 1983.  We did NOT
use it.  People asked for it, so we gave it out.

- In late 1986 we decided to make it our main implementation.
Previously our main implementation consisted of the assembly language
interpreter, a compiler which had been running since early 1984, and
Edwin, which was developed on that implementation and then ported by
TI to the PC. 

- Up until that point we had treated CScheme mostly as a curiosity.
After this, we developed a new compiler which coexists with the C
interpreter.  The compiler went into beta test a few weeks ago and
will be released later this year.  I'm fairly certain that by the end
of this summer CScheme (with its compiler) will provide performance
similar to that provided by other dialects of Lisp (Lucid, T), at
which point we may be able to reexamine some of the issues, but so far
it has been premature.  We've only been really working on it for 2
years.  As far as I know, most other implementations of Lisp have been
around for considerably more time, so we're not doing too badly.