[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Yes, they are special, but this would be MORE special:



   Date: Sat, 10 Jun 89  03:25:17 CDT
   From: Gyro@Reasoning.COM

      Date: Fri, 9 Jun 89 09:16 CDT
      From: ai.gooch@MCC.COM (William D. Gooch)
      Subject: Yes, they are special, but...

	  ...."highly specialized hardware"....

      I've heard this idea bandied about enough that I suppose I should
      understand it by now, but - sorry if I'm being dense - I still don't
      know what it means.  The term "specialized," usually stated as in
      opposition to "general-purpose," implies that there are things one
      cannot do on a Symbolics machine which other hardware does support.
      If you don't mind my asking, what are those things?  

   You can't run Un*x.

   Don't get the wrong idea.  I detest Un*x.

(Replying to my own message!)  I forgot what is actually, to me, the
most important thing I can't do on a Lisp Machine.

I can't run Scheme.  (I don't detest Scheme!)

Yes, I know about Rees' PseudoScheme.  It is not adequate for my
purposes on several counts, which I'm not inclined to detail here.
(I'll mention one: it doesn't support upward continuations.)

The real problem, for my purpose, is that the Lisp Machines have too
many assumptions built in about things like storage conventions and
function call protocol.  (My purpose is to do language and compiler
design research.  The ideal starting point for me, it turns out, is T,
Yale's high-performance Scheme implementation; partly because of the
excellent optimizing compiler, partly because I can change literally
anything I want, down to the tag bit assignments.)

---------------- My Dream CPU  (**FANTASY WARNING**)

I've been kicking around some ideas about this which I think (hope)
are interesting enough to share.  Imagine this:

The goal is a processor that can run the Symbolics environment as a
process under an only slightly modified Un*x.  It should be able to
run C at the speeds people are coming to expect -- 10 MIPS and up.  It
should also be able to run Lisp faster than anything else that has yet
been built.  Furthermore, it should offer a very wide range of
flexibility to the Lisp implementor; enough, specifically, to run both
systems like Genera and systems like T (which has an entirely
different function call protocol) with equally excellent performance.

Here's a metaphor for the design: we notice that several of the latest
RISC processors (e.g., Motorola 88000) include on-chip floating point.
The point is that floating-point operations are sufficiently frequent
(in some applications, anyway) and sufficiently expensive to simulate
that it's worth adding hardware to do them in parallel.  Well, we
could take the same point of view about the high-level operations
performed in Lisp code: we'll add hardware to do them in parallel.

What I'd propose is different from the current Symbolics designs in
that there is no microcode: the CPU is, at its core, a RISC design.
It can run a C-like instruction mix at or near one instruction per
cycle.  It's different from the extant RISC designs in having a 40-bit
word, the extra byte (which is not addressable except as part of the
word, i.e., isn't visible to byte load/store instructions) being used
for tag checking.  But, ideally, no assumptions whatever about the
meanings of particular tag values are built into the hardware; rather,
there might be a 1-cycle instruction "branch unless the tag on Rn is
T", and perhaps a 2-cycle instruction (i.e. with a delay slot) to use
a register's tag value to index into a dispatch table and branch to
the address found.  Any process should be able to set up read and
write traps, such that a trap is issued on a load or store of a word
with one of an arbitrarily chosen set of tag values.

A bounds-check instruction -- trap if Rn < 0 or Rn >= Rm -- would make
array referencing both fast and safe.  I don't know offhand how to
speed up array decoding, but I'll bet it can be done (although this is
one place where it is particularly hard to maintain full generality).

To deal with the low code density of RISC designs, there would be
support for embedded instruction sets, so one could write a very low
overhead instruction decode & dispatch loop.  This way, code which
does not have to be blindingly fast can be small.  Yet it's always
possible to compile any function down to the native instruction set if
it's time-critical.  The idea is of course the same as that behind
Greenblatt's CADR/LAMBDA microcompiler, except that since we're going
to RISC instructions rather than microcode, 1) the compilation is
easier and 2) there's no limit on the amount of code we can compile
this way.

To summarize, then, the design borrows elements from RISC *and* from
the 3600 (and presumably Ivory); compared to the latter, it's a little
bit of a throwback to the CADR because we abandon the very wide
microcode word.  So it's really in the center of these three points in
the design space.

To top it off, I would like to see it built using register
scoreboarding and multiple functional units, like the 88000 (the
classicist in me would say, "like the 6600").  Indeed, come to think
of it, it might be possible to build it as a coprocessor for the
88000, which is designed with a very-high-speed coprocessor interface.
Boy that would be a coup!

-- Scott