[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LispM's crummy I/O performance (was RE: LispM Market Share)

     Can you be more specific about this task.

Sure.  My data set is a sequence of 1 million pairs of integers.  The
first in each pair ranges from 0 to 32767, and the second ranges from 0
to 127.  So, a section of the data might be:
	3000 12 4562 50 18002 61 456 12 . . .
(Actually, each of these values corresponds to a word/part-of-speech
pair in a corpus of text, the Brown Corpus.  I have just mapped the
words and parts-of-speech into integers so that the LispM could process
them in my lifetime.)

     Do I now load this into a 1 million element array?

I tried a number of simple approaches.  My main goal was to play with
our new CM.  I didn't really want to fight with my LispM.  So, I tried
just using 'read in various ways, with my data in an ascii file.  When
that took an interminable amount of time, I tried viewing the whole
data set as one object and performing one 'read, and that took forever
too.  Finally, I gave up, and just performed a (setf data '(3000 12
4562 50 18002 61 456 12 . . .)) in a file and loaded that file.  All of
these methods take a number of hours to perform on a 3650 over NFS (we
don't have a LMFS that can hold the data).

  On a sun 3/50 this takes 
     under a minute to manipulate with say, wc, cat, and grep '111'.

Actually, we have a SUN 4/280, and using a version of egrep with a
variation on the Boyer-Moore algorithm, I can egrep through 10meg (the
Brown Corpus) in 3 seconds!  Try and come close to that on a LispM.

     What do you mean?  Something like:

     (with-open-file (stream file)
       (let ((results (read stream)))
         (do-unto results)))

     Do you mean with dump-forms-to-file?  If so, i'm not suprised.  A lot
     of time is spend growing (copying) the hash table that maintains eqness.

I didn't use dump-forms-to-file.  I need my data to be in a readable
form because the application which would use the data is written in C for a SUN.
The whole point of my problem is that the only reason I will use a
LispM is because the programming environment helps me get applications
up and running very quickly.  Since I am just using the LispM as a
front end for a CM in this instance, it *shouldn't* be the bottleneck,
for development time or run time.  It turns out that the LispM is the
bottleneck with respect to both issues.

     Perhaps your application would make a good benchmark.

I don't know about using it as a benchmark, but it is a task that
Symbolics should address.  Processing an ascii stream is a task every
machine should be able to perform efficiently.  You shouldn't need to
create 45 different flavors just to read in a few integers.  And there
should be a documented function which performs efficient reads.  The
hack by alanr@media-lab is probably pretty fast, and I'll probably use
it, but I don't think I should have to manipulate integers bit-by-bit
in order to read in ascii data.

My projects generally take advantage of a number of different types of
machines (our network has LispM's, various UNIX machines, and a CM),
and I need to be able to process the same data everywhere.  I want to
use each machine for what it is good at, but, based on the LispM I/O
rate, performance-wise the LispM isn't good at any aspect of my project.

I used to be a fanatical LispM supporter, but I have found programming
in C in an X environment with a primitive debugger preferable to
sitting in front of a LispM constantly waiting for something. 
Compilation, I/O, and Window operations all take forever on even our
fastest machines.  We are getting a UX400 board soon, and I hope it
will bring me back into the fold.  But, until then, my LispM remains idle.

-- David Magerman
University of Pennsylvania LINC Laboratory
(The opinions expressed above are my own, so don't blame my boss.)