[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: LispM's crummy I/O performance (was RE: LispM Market Share)
Can you be more specific about this task.
Sure. My data set is a sequence of 1 million pairs of integers. The
first in each pair ranges from 0 to 32767, and the second ranges from 0
to 127. So, a section of the data might be:
3000 12 4562 50 18002 61 456 12 . . .
(Actually, each of these values corresponds to a word/part-of-speech
pair in a corpus of text, the Brown Corpus. I have just mapped the
words and parts-of-speech into integers so that the LispM could process
them in my lifetime.)
Do I now load this into a 1 million element array?
I tried a number of simple approaches. My main goal was to play with
our new CM. I didn't really want to fight with my LispM. So, I tried
just using 'read in various ways, with my data in an ascii file. When
that took an interminable amount of time, I tried viewing the whole
data set as one object and performing one 'read, and that took forever
too. Finally, I gave up, and just performed a (setf data '(3000 12
4562 50 18002 61 456 12 . . .)) in a file and loaded that file. All of
these methods take a number of hours to perform on a 3650 over NFS (we
don't have a LMFS that can hold the data).
On a sun 3/50 this takes
under a minute to manipulate with say, wc, cat, and grep '111'.
Actually, we have a SUN 4/280, and using a version of egrep with a
variation on the Boyer-Moore algorithm, I can egrep through 10meg (the
Brown Corpus) in 3 seconds! Try and come close to that on a LispM.
What do you mean? Something like:
(with-open-file (stream file)
(let ((results (read stream)))
(do-unto results)))
Do you mean with dump-forms-to-file? If so, i'm not suprised. A lot
of time is spend growing (copying) the hash table that maintains eqness.
I didn't use dump-forms-to-file. I need my data to be in a readable
form because the application which would use the data is written in C for a SUN.
The whole point of my problem is that the only reason I will use a
LispM is because the programming environment helps me get applications
up and running very quickly. Since I am just using the LispM as a
front end for a CM in this instance, it *shouldn't* be the bottleneck,
for development time or run time. It turns out that the LispM is the
bottleneck with respect to both issues.
Perhaps your application would make a good benchmark.
I don't know about using it as a benchmark, but it is a task that
Symbolics should address. Processing an ascii stream is a task every
machine should be able to perform efficiently. You shouldn't need to
create 45 different flavors just to read in a few integers. And there
should be a documented function which performs efficient reads. The
hack by alanr@media-lab is probably pretty fast, and I'll probably use
it, but I don't think I should have to manipulate integers bit-by-bit
in order to read in ascii data.
My projects generally take advantage of a number of different types of
machines (our network has LispM's, various UNIX machines, and a CM),
and I need to be able to process the same data everywhere. I want to
use each machine for what it is good at, but, based on the LispM I/O
rate, performance-wise the LispM isn't good at any aspect of my project.
I used to be a fanatical LispM supporter, but I have found programming
in C in an X environment with a primitive debugger preferable to
sitting in front of a LispM constantly waiting for something.
Compilation, I/O, and Window operations all take forever on even our
fastest machines. We are getting a UX400 board soon, and I hope it
will bring me back into the fold. But, until then, my LispM remains idle.
-- David Magerman
University of Pennsylvania LINC Laboratory
(The opinions expressed above are my own, so don't blame my boss.)
k